REID (Research and Evaluation in Education)


parallel test items, test item development, mathematics evaluation, multiple-choice testing

Document Type



The study was aimed at describing five methods of the development of parallel test items of the multiple-choice type in mathematics at Yogyakarta (primary education level). The study was descriptive research involving 22 mathematics teachers as the respondents. Data collection was conducted through interviews and document reviews concerning the developed test packages. A questionnaire was used to gather data about the procedure the teachers employed in developing the tests. Findings show that the teachers used five methods in developing the test item; namely (1) randomizing the item numbers; (2) randomizing the sequences of response options; (3) writing items using the same contexts but different figures; (4) using anchor items; and (5) writing different items based on the same specification table. All of the respondents stated that they developed the table of the specification before developing the test items and that most of them (77%) did the validation of the instruments in content and language.

Page Range






Digital Object Identifier (DOI)





Abdullah, S., Mansyur, M., & Rosdiyanah, R. (2016). Pengaruh jumlah butir anchor terhadap hasil penyetaraan tes berdasarkan teori respon butir. Jurnal Kependidikan: Penelitian Inovasi Pembelajaran, 46(2), 207-218. https://doi.org/10.21831/JK.V46I2.10935

Ali, S. H., Carr, P. A., & Ruit, K. G. (2016). Validity and reliability of scores obtained on multiple-choice questions: Why functioning distractors matter. Journal of the Scholarship of Teaching and Learning, 16(1), 1-14. https://doi.org/10.14434/josotl.v16i1.19106

Gunawan, G., & Prabowo, D. A. (2017). Sistem ujian online seleksi penerimaan mahasiswa baru dengan pengacakan soal menggunakan Linear Congruent Method (Studi kasus di Universitas Muhammadiyah Bengkulu). Jurnal Informatika Upgris, 3(2), 143-151. https://doi.org/10.26877/jiu.v3i2.1872

Herkusumo, A. P. (2011). Penyetaraan (equating) ujian akhir sekolah berstandar nasional (UASBN) dengan teori tes klasik. Jurnal Pendidikan Dan Kebudayaan, 17(4), 455-471. https://doi.org/10.24832/jpnk.v17i4.41

Kartianom, K., & Mardapi, D. (2017). The utilization of junior high school mathematics national examination data: A conceptual error diagnosis. REiD (Research and Evaluation in Education), 3(2), 163-173. https://doi.org/10.21831/reid.v3i2.18120

Kartono, K. (2008). Penyetaraan tes model campuran butir dikotomus dan politomus pada tes prestasi belajar. Jurnal Penelitian Dan Evaluasi Pendidikan, 12(2), 302-320. https://doi.org/10.21831/pep.v12i2.1433

Kehoe, J. (1995a). Basic item analysis for multiple-choice tests. Practical Assessment, Research & Evaluation, 4(10), 1-3.

Kehoe, J. (1995b). Writing multiple-choice test items. ERIC/AE Digest Series EDO-TM-95-3, 3, 1-6.

Mardapi, D. (2008). Teknik penyusunan instrumen tes dan nontes. Yogyakarta: Mitra Cendekia.

Rasyid, H., & Mansur. (2008). Penilaian hasil belajar. Bandung: CV Wacana Prima.

Reynolds, C. R., Livingston, R. B., & Willson, V. L. (2009). Measurement and assessment in education (2nd ed.). Upper Saddle River, NJ: Pearson.

Rosnawati, R., Kartowagiran, B., & Jailani, J. (2015). A formative assessment model of critical thinking in mathematics learning in junior high school. REiD (Research and Evaluation in Education), 1(2), 186-198. https://doi.org/10.21831/reid.v1i2.6472

Royal, K., & Dorman, D. (2018). Comparing item performance on three- versus four-option multiple choice questions in a veterinary toxicology course. Veterinary Sciences, 5(2), 55. https://doi.org/10.3390/vetsci5020055

Stiggins, R. J., & Chappuis, J. (2012). An introduction to student-involved assessment for learning. Boston, MA: Pearson.

Tarrant, M., & Ware, J. (2010). A comparison of the psychometric properties of three- and four-option multiple-choice questions in nursing assessments. Nurse Education Today, 30(6), 539-543. https://doi.org/10.1016/j.nedt.2009.11.002

Torres, C., Lopes, A. P., Babo, L., & Azevedo, J. (2011). Improving multiple-choice questions. US-China Education Review, B(1), 1-11.

Widdiharto, R., Kartowagiran, B., & Sugiman, S. (2017). A construct of the instrument for measuring junior high school mathematics teacher's self-efficacy. REiD (Research and Evaluation in Education), 3(1), 64-76. https://doi.org/10.21831/reid.v3i1.13559

Wilkie, J. E. B., & Bodenhausen, G. V. (2015). The numerology of gender: Gendered perceptions of even and odd numbers. Frontiers in Psychology, 6, 810. https://doi.org/10.3389/fpsyg.2015.00810