REID (Research and Evaluation in Education)


assessment, problem-solving skill, CAT

Document Type



Evaluation using computerized adaptive tests (CAT) is an alternative to paper-based tests (PBT). This study was aimed at mapping physics problem-solving skills using PhysProSS-CAT on the basis of the item response theory (IRT). The study was conducted inSleman Regency, Yogyakarta, involving 156 students of Grade XI of senior high school. Sampling was done using stratified random sampling technique. The results of the study show that the PhysProSS-CAT is able to accurately measure physics problem-solving skills. Students' competences in physics problem solving can be mapped as 6% of the very high category, 4% of the high category, 36% of the medium category, 36% of the low category, and 18% of the very low category. This shows that the majority of the students' competences in physics problem solving lies within the categories of medium and low.

Page Range






Digital Object Identifier (DOI)





Adedoyin, O. O. (2010). Investigating the invariance of person parameter estimates based on classical test and item response theories. International Journal of Educational Sciences, 2(2), 107-113. https://doi.org/10.1080/09751122.2010.11889987

Arifin, Z. (2016). Evaluasi pembelajaran: Prinsip, teknik, dan prosedur (8th ed.). Jakarta: Remaja Rosdakarya.

Azwar, S. (2010). Metode penelitian. Yogyakarta: Pustaka Pelajar.

Bagus, H. C. (2012). The national exam administration by using computerized adaptive testing (CAT) model. Jurnal Pendidikan Dan Kebudayaan, 18(1), 45-53.

Balan, Y. A., Sudarmin, S., & Kustiono, K. (2017). Pengembangan model computer-based test (CBT) berbasis Adobe Flash untuk sekolah menengah kejuruan. Innovative Journal of Curriculum and Educational Technology, 6(1), 36-44. https://doi.org/10.1186/2229-0443-1-3-60

Barton, M. A., & Lord, F. M. (1981). An upper asymptote for the three-parameter logistic item-response model. ETS Research Report Series (Vol. 1981). Princeton, NJ: John Wiley & Sons. https://doi.org/10.1002/j.2333-8504.1981.tb01255.x

Bock, R. D. (1972). Estimating item parameters and latent ability when responses are scored in two or more nominal categories. Psychometrika, 37(1), 29-51. https://doi.org/10.1007/BF02291411

Brookhart, S. M. (2010). How to assess higher-order thinking skills in your classroom. Alexandria, VA: ASCD.

Bueno, P. M. (2014). Assessment of achievement in problem-solving skills in a General Chemistry course. Journal of Technology and Science Education, 4(4), 260-269. https://doi.org/10.3926/jotse.100

Daryanto, & Karim, S. (2017). Pembelajaran abad 21. Yogyakarta: Gava Media.

Gregory, R. J. (2014). Psychological testing: History, principles and applications (7th ed.). Wheaton, IL: Pearson.

Guttman, L. (1944). A basis for scaling qualitative data. American Sociological Review, 9(2), 139-150. https://doi.org/10.2307/2086306

Hadi, H. (2013). Pengembangan Computerized Adaptive Test berbasis web. Yogyakarta: Aswaja Pressindo.

Hambleton, R. K., & Swaminathan, H. (1985). Item response theory: Principles and applications. Boston, MA: Kluwer Nijhoff.

Hambleton, R. K., Swaminathan, H., & Rogers, H. J. (1991). Fundamentals of item response theory. Newbury Park, CA: Sage Publications.

Hambleton, R. K., & Zaal, J. N. (1991). Advances in educational and psychological testing. Boston, MA: Kluwer Academic.

Heong, Y. M., Othman, W. B., Yunos, J. B. M., Kiong, T. T., Hassan, R. Bin, & Mohamad, M. M. B. (2011). The level of Marzano higher order thinking skills among technical education students. International Journal of Social Science and Humanity, 1(2), 121-125.

Huang, H.-Y., Chen, P.-H., & Wang, W.-C. (2012). Computerized adaptive testing using a class of high-order item response theory models. Applied Psychological Measurement, 36(8), 689-706. https://doi.org/10.1177/0146621612459552

Istiyono, E. (2013). Pengembangan instrumen untuk mengukur kemampuan berpikir tingkat tinggi dalam mata pelajaran Fisika di SMA. Yogyakarta: Department of Physics Education, Universitas Negeri Yogyakarta.

Istiyono, E. (2017). The analysis of senior high school students' physics HOTS in Bantul District measured using PhysReMChoTHOTS. In AIP Conference Proceedings (Vol. 1868, p. 070008). AIP Publishing LLC. https://doi.org/10.1063/1.4995184

Istiyono, E., Mardapi, D., & Suparno, S. (2014). Pengembangan tes kemampuan berpikir tingkat tinggi fisika (PhysTHOTS) peserta didik SMA. Jurnal Penelitian Dan Evaluasi Pendidikan, 18(1), 1-12. https://doi.org/10.21831/pep.v18i1.2120

Jiao, H., Macready, G., Liu, J., & Cho, Y. (2012). A mixture Rasch model-based computerized adaptive test for latent class identification. Applied Psychological Measurement, 36(6), 469-493. https://doi.org/10.1177/0146621612450068

Lazarsfeld, P. F., & Henry, N. W. (1968). Latent structure analysis. New York, NY: Houghton Mifflin.

Linacre, J. M. (2006). WINSTEP: Rasch-model computer programs. Chicago, IL: Winstep.com.

Lord, F. (1952). A theory of test scores. Richmond, VA: Psychometric Corporation.

Luecht, R. M., & Sireci, S. G. (2011). A review of models for computer-based testing. New York, NY: The College Board.

Masters, G. N. (1982). A rasch model for partial credit scoring. Psychometrika, 47(2), 149-174. https://doi.org/10.1007/BF02296272

Masters, G. N., & Keeves, J. P. (1999). Advances in measurement in educational research and assessment (1st ed.). Amsterdam: Pergamon.

Miller, M. D., Linn, R. L., & Gronlund, N. E. (2009). The role of measurement and assessment in teaching. In Measurement and assessment in teaching (10th ed., pp. 29-31). Upper Saddle River, NJ: Pearson Education.

Ministry of Education and Culture. (2013). Pengembangan kurikulum 2013. Jakarta: Kementerian Pendidikan dan Kebudayaan.

Mundilarto. (2010). Penilaian hasil belajar Fisika. Yogyakarta: Pusat Pengembangan Instruksional Sains (P2IS) Jurdik Fisika FPMIPA UNY.

Nitko, A. J., & Brookhart, S. M. (2011). Educational assessment of students (6th ed.). Boston, MA: Pearson Education.

Pakpahan, R. (2016). Model ujian nasional berbasis komputer: Manfaat dan tantangan. Jurnal Pendidikan Dan Kebudayaan, 1(1), 19-35. https://doi.org/10.24832/jpnk.v1i1.225

Retnawati, H. (2014). Teori respons butir dan penerapannya: Untuk peneliti, praktisi pengukuran dan pengujian, mahasiswa pascasarjana. Yogyakarta: Nuha Medika.

Riley, B. B., & Carle, A. C. (2012). Comparison of two Bayesian methods to detect mode effects between paper-based and computerized adaptive assessments: A preliminary Monte Carlo study. BMC Medical Research Methodology, 12, 124. https://doi.org/10.1186/1471-2288-12-124

Samejima, F. (1969). Estimation of latent ability using a response pattern of graded scores. In Psychometrika Monograph, No. 17. Richmond, VA: Psychometric Society.

Schraw, G. J., & Robinson, D. H. (2011). Assessment of higher order thinking skills: Current perspectives on cognition, learning, and instruction. Charlotte, NC: Information Age Publishing.

Suyoso, S., Istiyono, E., & Subroto, S. (2017). Pengembangan instrumen asesmen pengetahuan fisika berbasis komputer untuk meningkatkan kesiapan peserta didik dalam menghadapi ujian nasional berbasis komputer. Jurnal Pendidikan Matematika Dan Sains, 5(1), 89-97. https://doi.org/10.21831/jpms.v5i1.12461

van der Linden, W. J., & Glas, C. A. W. (2003). Computerized adaptive testing: Theory and practice. London: Kluwer Academic Publisher.

Weiss, D. J. (2004). Computerized adaptive testing for effective and efficient measurement in counseling and education. Measurement and Evaluation in Counseling and Development, 37(2), 70-84. https://doi.org/10.1080/07481756.2004.11909751