•  
  •  
 

Keywords

ability estimation; model fit; person fit; polytomous IRT; scoring correction

Document Type

Article

Abstract

Scoring quality has been recognized as one of the important aspects that should be of concern to both test developers and users. This study aimed to investigate the effect of scoring correction and model fit on the estimation of ability parameters and person fit in the polytomous item response theory. The result of 165 students in the Statistics course (SATS4410) test at one of the universities in Indonesia was used to answer the problems in this study. The polytomous data obtained from scoring the test results were analyzed using the Item Response Theory (IRT) approach with the Partial Credit Model (PCM), Graded Response Model (GRM), and Generalized Partial Credit Model (GPCM). The effect of scoring correction and model fit on the estimation of ability and person fit was tested using multivariate analysis. Among the three models used, GRM showed the best fit based on p-value and RSMEA. The results of the analysis also showed that there was no significant effect of scoring correction and model fit on the estimation of the test taker's ability and person fit. From the results of this study, we recommend the importance of evaluating the levels or categories used in scoring student work on a test.

Page Range

140-151

Issue

2

Volume

8

Digital Object Identifier (DOI)

10.21831/reid.v8i2.54429

Source

https://journal.uny.ac.id/index.php/reid/article/view/54429

References

Chalmers, R. P. (2021). MIRT: Multidimensional item response theory. https://cran.r-project.org/package=mirt

Cui, Y., & Mousavi, A. (2015). Explore the usefulness of person-fit analysis on large-scale assessment. International Journal of Testing, 15(1), 23-49. https://doi.org/10.1080/15305058.2014.977444

Djidu, H., Ismail, R., Sumin, S., Rachmaningtyas, N. A., Imawan, O. R., Suharyono, S., Aviory, K., Prihono, E. W., Kurniawan, D. D., Syahbrudin, J., Nurdin, N., Marinding, Y., Firmansyah, F., Hadi, S., & Retnawati, H. (2022). Analisis instrumen penelitian dengan pendekatan teori tes klasik dan modern menggunakan program R [Analysis of research instruments with classical and modern test theory approaches using the R program]. UNY Press.

Djidu, H., & Retnawati, H. (2022). IRT unidimensi penskoran dikotomi [Unidimensional IRT for dichotomous scoring]. In H. Retnawati & S. Hadi (Eds.), Analisis instrumen penelitian dengan pendekatan teori tes klasik dan modern menggunakan program R [Analysis of research instruments with classical and modern test theory approaches using the R program] (pp. 89-141). UNY Press.

Dodeen, H., & Darabi, M. (2009). Person“fit: Relationship with four personality tests in mathematics. Research Papers in Education, 24(1), 115-126. https://doi.org/10.1080/02671520801945883

Hambleton, R. K., & Swaminathan, H. (1985). Item response theory: Principles and applications. Nijhoff Publishing.

Hambleton, R. K., Swaminathan, H., & Rogers, H. J. (1991). Fundamentals of item response theory. SAGE Publications.

Meijer, R. R., & Sijtsma, K. (2001). Methodology review: Evaluating person fit. Applied Psychological Measurement, 25(2), 107-135. https://doi.org/10.1177/01466210122031957

Miller, M. D., Linn, R. L., & Gronlund, N. E. (2009). Measurement and assessment in teaching. Pearson Education.

Mousavi, A., Cui, Y., & Rogers, T. (2019). An examination of different methods of setting cutoff values in person fit research. International Journal of Testing, 19(1), 1-22. https://doi.org/10.1080/15305058.2018.1464010

Nitko, A. J., & Brookhart, S. M. (2014). Educational assessment of students (6th ed.). Pearson.

Paek, I., Liang, X., & Lin, Z. (2021). Regarding item parameter invariance for the Rasch and the 2-parameter logistic models: An investigation under finite non-representative sample calibrations. Measurement: Interdisciplinary Research and Perspectives, 19(1), 39-54. https://doi.org/10.1080/15366367.2020.1754703

Pan, T., & Yin, Y. (2017). Using the Bayes factors to evaluate person fit in the item response theory. Applied Measurement in Education, 30(3), 213-227. https://doi.org/10.1080/08957347.2017.1316275

R Core Team. (2022). R: A language and environment for statistical computing. https://www.r-project.org/

Retnawati, H. (2014). Teori respons butir dan penerapannya: Untuk peneliti, praktisi pengukuran dan pengujian, mahasiswa pascasarjana [Advanced item response theory and its applications: For researchers, measurement and testing practitioners, graduate students]. Parama Publishing.

Revelle, W. (2022). Psych: Procedures for psychological, psychometric, and personality research. https://personality-project.org/r/psych/

Sarkar, D. (2008). Lattice: Multivariate data visualization with R. Springer. http://lmdvr.r-forge.r-project.org

Si, C.-F., & Schumacker, R. E. (2004). Ability estimation under different item parameterization and scoring models. International Journal of Testing, 4(2), 137-181. https://doi.org/10.1207/s15327574ijt0402_3

Spoden, C., Fleischer, J., & Frey, A. (2020). Person misfit, test anxiety, and test-taking motivation in a large-scale mathematics proficiency test for self-evaluation. Studies in Educational Evaluation, 67, 1-7. https://doi.org/10.1016/j.stueduc.2020.100910

Wind, S. A., & Walker, A. A. (2019). Exploring the correspondence between traditional score resolution methods and person fit indices in rater-mediated writing assessments. Assessing Writing, 39, 25-38. https://doi.org/10.1016/j.asw.2018.12.002

Van der Linden, W. J., & Hambleton, R. K. (Eds.). (1997). Handbook of modern item response theory. Springer Science + Business Media. https://doi.org/10.1007/978-1-4757-2691-6

Share

COinS