REID (Research and Evaluation in Education)


correct option placement, order of items, parallel test

Document Type



This research aims to prove that a parallel test can be constructed by randomizing the test item numbers and or alternative answers' order. This study used the experimental method with a post-test only non-equivalent control group design, involving junior high schools students in Yogyakarta City with a sample of 320 students of State Junior High School (SMPN) 5 Yogyakarta and 320 students of SMPN 8 Yogyakarta established using the stratified proportional random sampling technique. The instrument used is a mathematics test in the form of an objective test consisting of a five-question package and each package contains 40 items with four alternatives. The test package is randomized in the item numbers' order from the smallest to the largest and vice versa. The options in each item are also randomized from A to D and vice versa. Each item is analyzed using the Classical Test Theory and Item Response Theory approaches, while data analysis is done using the discrimination index with Kruskal-Wallis test technique to see the differences among the five-question packages. The study reveals that the result of item analysis using the Classical Test Theory and Item Response Theory approaches shows no significant difference in the difficulty index among Package 1 until Package 5. Nevertheless, according to the Classical Test Theory, there is a category shift of the difficulty index of Package 2 until Package 5 when compared to Package 1 - the original package - which is, in general, not a good package, because it contains too easy items.

Page Range






Digital Object Identifier (DOI)





Allen, M. J., & Yen, W. M. (1979). Introduction to measurement theory. Los Angeles, CA: Wadsworth.

Awopeju, O. A., & Afolabi, E. R. I. (2016). Comparative analysis of Classical Test Theory and Item Response Theory based item parameter estimates of senior school certificate mathematics examination. European Scientific Journal, ESJ, 12(28), 263-284. https://doi.org/10.19044/esj.2016.v12n28p263

Azwar, S. (2013). Reliabilitas dan validitas (4th ed.). Yogyakarta: Pustaka Pelajar.

Azwar, S. (2015). Reliabilitas dan validitas. Yogyakarta: Pustaka Pelajar.

Baker, F. B. (2001). The basics of item response theory (2nd ed.). College Park, MD: ERIC Clearinghouse on Assessment and Evaluation.

Bichi, A. A. (2016). Classical Test Theory: An introduction to linear modeling approach to test and item analysis. International Journal for Social Studies, 2(9), 27-33. https://doi.org/10.26643/ijss.v2i9.6690

Center for Educational Assessment. (2014). Laporan pengolahan Ujian Nasional tahun ajaran 2014/2015 (Unpublished). Jakarta: Center for Educational Assessment of Republic of Indonesia.

Fernandes, H. J. X. (1984). Testing and measurement. Jakarta: National Education Planning, Evaluation, and Curriculum Development.

Field, A. (2009). Discovering statistics using SPSS (3rd 3d.). London: Sage Publications.

Gá¹»ler, N., Uyanik, G. K., & Teker, G. T. (2014). Comparison of Classical Test Theory and Item Response Theory in terms of item parameters. European Journal of Research on Education, 2(1), 1-6.

Hamdi, S., Kartowagiran, B., & Haryanto, H. (2018). Developing a testlet model for mathematics at elementary level. International Journal of Instruction, 11(3), 375-390. https://doi.org/10.12973/iji.2018.11326a

Johnson, R. A., & Wichern, D. W. (2002). Applied multivariate statistical analysis. Englewood Cliffs, NJ: Prentice-Hall.

Kronmüller, K.-T., Saha, R., Kratz, B., Karr, M., Hunt, A., Mundt, C., & Backenstrass, M. (2008). Reliability and validity of the knowledge about depression and mania inventory. Psychopathology, 41(2), 69-76. https://doi.org/10.1159/000111550

Law No. 14 of 2005 of Republic of Indonesia about Teachers and Lecturers. , (2005).

Mardapi, D. (2014). Pengukuran, penilaian, dan evaluasi pendidikan. Yogyakarta: Nuha Litera.

Mehrens, W. A., & Lehmann, J. L. (1973). Measurement and evaluation in education and psychology. New York, NY: Holt, Rinehart, and Winston.

Miller, M. D., Linn, R. L., & Gronlund, N. E. (2009). Measurement and assessment in teaching (10th ed.). Upper Saddle River, NJ: Pearson.

Naga, D. S. (1992). Pengantar teori sekor pada pengukuran pendidikan. Jakarta: Gunadarma.

Purnama, D. N. (2017). Characteristics and equation of accounting vocational theory trial test items for vocational high schools by subject-matter teachers' forum. REiD (Research and Evaluation in Education), 3(2), 152-162. https://doi.org/10.21831/reid.v3i2.18121

Putro, N. H. P. S. (2013). Karakteristik butir soal ulangan kenaikan kelas sebagai persiapan bank soal Bahasa Inggris. Jurnal Penelitian Dan Evaluasi Pendidikan, 15(1), 92-114. https://doi.org/10.21831/pep.v15i1.1089

Rasyid, H., & Mansur, M. (2008). Penilaian hasil belajar. Bandung: CV Wacana Prima.

Reckase, M. D. (1979). Unifactor latent trait models applied to multifactor tests: Results and implications. Journal of Educational Statistics, 4(3), 207-230. https://doi.org/10.3102/10769986004003207

Retnawati, H. (2014). Teori respons butir dan penerapannya: Untuk peneliti, praktisi pengukuran dan pengujian, mahasiswa pascasarjana. Yogyakarta: Nuha Medika.

Reynolds, C. R., Livingston, R. B., & Willson, V. L. (2009). Measurement and assessment in education (2nd ed.). Upper Saddle River, NJ: Pearson.

Rohmawati, R. (Ed.). (2013). Kurikulum 2013, 87 persen guru kesulitan cara penilaian. Retrieved January 6, 2018, from https://unnes.ac.id/berita/87-persen-guru-kesulitan-soal-penilaian-kurikulum-2013.html

Sanjaya, W. (2010). Kurikulum dan pembelajaran. Jakarta: Kencana.

Santoso, A. (2013). Pemilihan butir alternatif pada tes adaptif untuk peningkatan keamanan tes. Jurnal Kependidikan: Penelitian Inovasi Pembelajaran, 43(1), 1-8. https://doi.org/10.21831/jk.v43i1.1953

Sumintono, B., & Widhiarso, W. (2015). Aplikasi pemodelan Rasch pada assessment pendidikan. Cimahi: Trim Komunikata.

Surya, A., & Aman, A. (2016). Developing formative authentic assessment instruments based on learning trajectory for elementary school. REiD (Research and Evaluation in Education), 2(1), 13-24. https://doi.org/10.21831/reid.v2i1.6540

Werheid, K., Hoppe, C., Thone, A., Muller, U., Mungersdorf, M., & von Cramon, D. Y. (2002). The adaptive digit ordering test clinical application, reliability, and validity of a verbal working memory test. Archives of Clinical Neuropsychology, 17(6), 547-565. https://doi.org/10.1093/arclin/17.6.547

Zaman, A., Kashmiri, A.-U.-R., Mubarak, M., & Ali, A. (2008). Students ranking, based on their abilities on objective type test: Comparison of CTT and IRT. Edu-Com International Conference, 591-599. Retrieved from https://ro.ecu.edu.au/ceducom/52/