Keywords
ability estimation; model fit; person fit; polytomous IRT; scoring correction
Document Type
Article
Abstract
Scoring quality has been recognized as one of the important aspects that should be of concern to both test developers and users. This study aimed to investigate the effect of scoring correction and model fit on the estimation of ability parameters and person fit in the polytomous item response theory. The result of 165 students in the Statistics course (SATS4410) test at one of the universities in Indonesia was used to answer the problems in this study. The polytomous data obtained from scoring the test results were analyzed using the Item Response Theory (IRT) approach with the Partial Credit Model (PCM), Graded Response Model (GRM), and Generalized Partial Credit Model (GPCM). The effect of scoring correction and model fit on the estimation of ability and person fit was tested using multivariate analysis. Among the three models used, GRM showed the best fit based on p-value and RSMEA. The results of the analysis also showed that there was no significant effect of scoring correction and model fit on the estimation of the test taker's ability and person fit. From the results of this study, we recommend the importance of evaluating the levels or categories used in scoring student work on a test.
Page Range
140-151
Issue
2
Volume
8
Digital Object Identifier (DOI)
10.21831/reid.v8i2.54429
Source
https://journal.uny.ac.id/index.php/reid/article/view/54429
Recommended Citation
Santoso, A., Pardede, T., Djidu, H., Apino, E., Rafi, I., Rosyada, M., & Hamid, H. A. (2022). The effect of scoring correction and model fit on the estimation of ability parameter and person fit on polytomous item response theory. REID (Research and Evaluation in Education), 8(2), 140-151. https://doi.org/10.21831/reid.v8i2.54429
References
Chalmers, R. P. (2021). MIRT: Multidimensional item response theory. https://cran.r-project.org/package=mirt
Cui, Y., & Mousavi, A. (2015). Explore the usefulness of person-fit analysis on large-scale assessment. International Journal of Testing, 15(1), 23-49. https://doi.org/10.1080/15305058.2014.977444
Djidu, H., Ismail, R., Sumin, S., Rachmaningtyas, N. A., Imawan, O. R., Suharyono, S., Aviory, K., Prihono, E. W., Kurniawan, D. D., Syahbrudin, J., Nurdin, N., Marinding, Y., Firmansyah, F., Hadi, S., & Retnawati, H. (2022). Analisis instrumen penelitian dengan pendekatan teori tes klasik dan modern menggunakan program R [Analysis of research instruments with classical and modern test theory approaches using the R program]. UNY Press.
Djidu, H., & Retnawati, H. (2022). IRT unidimensi penskoran dikotomi [Unidimensional IRT for dichotomous scoring]. In H. Retnawati & S. Hadi (Eds.), Analisis instrumen penelitian dengan pendekatan teori tes klasik dan modern menggunakan program R [Analysis of research instruments with classical and modern test theory approaches using the R program] (pp. 89-141). UNY Press.
Dodeen, H., & Darabi, M. (2009). Person“fit: Relationship with four personality tests in mathematics. Research Papers in Education, 24(1), 115-126. https://doi.org/10.1080/02671520801945883
Hambleton, R. K., & Swaminathan, H. (1985). Item response theory: Principles and applications. Nijhoff Publishing.
Hambleton, R. K., Swaminathan, H., & Rogers, H. J. (1991). Fundamentals of item response theory. SAGE Publications.
Meijer, R. R., & Sijtsma, K. (2001). Methodology review: Evaluating person fit. Applied Psychological Measurement, 25(2), 107-135. https://doi.org/10.1177/01466210122031957
Miller, M. D., Linn, R. L., & Gronlund, N. E. (2009). Measurement and assessment in teaching. Pearson Education.
Mousavi, A., Cui, Y., & Rogers, T. (2019). An examination of different methods of setting cutoff values in person fit research. International Journal of Testing, 19(1), 1-22. https://doi.org/10.1080/15305058.2018.1464010
Nitko, A. J., & Brookhart, S. M. (2014). Educational assessment of students (6th ed.). Pearson.
Paek, I., Liang, X., & Lin, Z. (2021). Regarding item parameter invariance for the Rasch and the 2-parameter logistic models: An investigation under finite non-representative sample calibrations. Measurement: Interdisciplinary Research and Perspectives, 19(1), 39-54. https://doi.org/10.1080/15366367.2020.1754703
Pan, T., & Yin, Y. (2017). Using the Bayes factors to evaluate person fit in the item response theory. Applied Measurement in Education, 30(3), 213-227. https://doi.org/10.1080/08957347.2017.1316275
R Core Team. (2022). R: A language and environment for statistical computing. https://www.r-project.org/
Retnawati, H. (2014). Teori respons butir dan penerapannya: Untuk peneliti, praktisi pengukuran dan pengujian, mahasiswa pascasarjana [Advanced item response theory and its applications: For researchers, measurement and testing practitioners, graduate students]. Parama Publishing.
Revelle, W. (2022). Psych: Procedures for psychological, psychometric, and personality research. https://personality-project.org/r/psych/
Sarkar, D. (2008). Lattice: Multivariate data visualization with R. Springer. http://lmdvr.r-forge.r-project.org
Si, C.-F., & Schumacker, R. E. (2004). Ability estimation under different item parameterization and scoring models. International Journal of Testing, 4(2), 137-181. https://doi.org/10.1207/s15327574ijt0402_3
Spoden, C., Fleischer, J., & Frey, A. (2020). Person misfit, test anxiety, and test-taking motivation in a large-scale mathematics proficiency test for self-evaluation. Studies in Educational Evaluation, 67, 1-7. https://doi.org/10.1016/j.stueduc.2020.100910
Wind, S. A., & Walker, A. A. (2019). Exploring the correspondence between traditional score resolution methods and person fit indices in rater-mediated writing assessments. Assessing Writing, 39, 25-38. https://doi.org/10.1016/j.asw.2018.12.002
Van der Linden, W. J., & Hambleton, R. K. (Eds.). (1997). Handbook of modern item response theory. Springer Science + Business Media. https://doi.org/10.1007/978-1-4757-2691-6