Jurnal Penelitian dan Evaluasi Pendidikan


est item characteristics; accounting; learning competencies; Rasch model

Document Type



The study is aimed at describing: (1) characteristics of the items of the national examination try-out test of the accounting subject matter in the 2015/2016 academic year on classical test theory and modern test theory; and (2) classification of students' masteries in the learning of accounting. The study is explorative research. Analyses are conducted using the classical and modern test theories for item characteristics and descriptive quantitative for students' masteries in accounting using the test set for the national examination try-out in the 2015/2016 academic year. A total of 414 students do the Package A test. Results show that (1) based on the classical test analyses, a number of 11 items (27.5%) belong to the "easy" category, 22 items (55%) "medium" category, and 7 items (17.5%) "difficult" category allowing a total of 19 (47.5%) to be categorized as good items; meanwhile, on the modern-theory analyses, a total of 34 items (85%) belong to the "good" category. (2) Around 38% of the students have competencies of the medium and low categories. Most students have difficulty in answering questions of the higher-order thinking levels.

First Page


Last Page






Digital Object Identifier (DOI)



Crocker, L. M., & Algina, J. (1986). Introduction to classical and modern test theory. Fort Worth, TX: Harcourt Brace Jovanovich.

Egan, K. L., Sireci, S. G., & Swaminathan, H. (1998). Effect of item bundling on the assessment of test dimensionality. In the paper presented at the annual meeting of the National Council on Measurement in Education. San Diego, CA.

Field, A. (2009). Discovering statistics using SPSS (3rd 3d.). London: Sage Publications.

Hambleton, R. K., & Swaminathan, H. (1985). Item response theory: Principles and applications. Boston, MA: Kluwer Nijhoff.

Johnson, R. A., & Wichern, D. W. (2002). Applied multivariate statistical analysis (5th ed.). Upper Saddle River, NJ: Prentice Hall.

Kartowagiran, B. (2012). Penulisan butir soal. In the paper presented in Training on Writing and Analysis of Items for the Civil Servant-Rekinpeg Resource. Hotel Kawanua Aerotel, Jakarta.

Law of Republic of Indonesia No. 20 of 2003 on National Education System (2003).

Linn, R. L. (1989). Educational measurement. New York, NY: Macmillan.

Manoppo, Y., & Mardapi, D. (2014). Analisis metode cheating pada tes berskala besar. Jurnal Penelitian Dan Evaluasi Pendidikan, 18(1), 115-128. Retrieved from https://journal.uny.ac.id/index.php/jpep/article/view/2128/1773

Mardapi, D. (2012). Pengukuran, penilaian, dan evaluasi pendidikan. Yogyakarta: Nuha Medika.

Mardapi, D. (2014). Authentic assessment. In the paper presented at HEPI Conference. Denpasar, Bali.

Reckase, M. D. (1979). Unifactor latent trait models applied to multifactor tests: Results and implications. Journal of Educational Statistics, 4(3), 207-230. https://doi.org/10.3102/10769986004003207

Regulation of the Minister of National Education No. 19 of 2005, on National Standard of Education (2005). Republic of Indonesia.

Retnawati, H. (2014). Teori respons butir dan penerapannya: Untuk peneliti, praktisi pengukuran dan pengujian, mahasiswa pascasarjana. Yogyakarta: Nuha Medika.

Reynolds, C. R., Livingston, R. B., & Willson, V. L. (2009). Measurement and assessment in education (2nd ed.). Upper Saddle River, NJ: Pearson.

Smits, N., Cuijpers, P., & van Straten, A. (2011). Applying computerized adaptive testing to the CES-D scale: A simulation study. Psychiatry Research, 188(1), 147-155. https://doi.org/10.1016/j.psychres.2010.12.001

Stage, C. (2003). Classical test theory or item response theory: The Swedish experience. Santiago, Chile: Centro de Estudios Públicos.

Wiberg, M. (2004). Classical test theory vs. item response theory: An evaluation of the theory test in the Swedish driving-license test. Stockholm: Umea Universitet.

Williams, B., Onsman, A., & Brown, T. (2003). Exploratory factor analysis: A five-step guide for novices. Australasian Journal of Paramedicine, 8(3), 1-13. Retrieved from https://ajp.paramedics.org/index.php/ajp/article/view/93/90

Wright, B. D., & Masters, G. N. (2008). Rating scale analysis: Rasch measurement. Chicago, IL: Mesa Press.

Wu, Q., Zhang, Z., Song, Y., Zhang, Y., Zhang, Y., Zhang, F., "¦ Miao, D. (2013). The development of mathematical test based on item response theory. International Journal of Advancements in Computing Technology, 5(10), 209-216.