Comparing item parameter estimates and fit statistics of the Rasch model from three different traditions

Bahrul Hayat, Universitas Islam Negeri Syarif Hidayatullah Jakarta, Indonesia
Muhammad Dwirifqi Kharisma Putra, Universitas Gadjah Mada, Indonesia
Bambang Suryadi, Universitas Islam Negeri Syarif Hidayatullah Jakarta, Indonesia

Document Type

Article

Abstract

Rasch model is a method that has a long history in its application in the fields of social and behavioral sciences, including educational measurement. Under certain circumstances, Rasch models are known as a special case of Item response theory (IRT), while IRT is equivalent to the Item Factor Analysis (IFA) models as a special case of Structural Equation Models (SEM), although there are other 'tradition' that consider Rasch measurement models not part of both. In this study, a simulation study was conducted using simulated data to explain how the interrelationships between the Rasch model as a constraint version of 2-parameter logistic (2-PL) IRT, Rasch model as an item factor analysis were compared with the Rasch measurement model using Mplus, IRTPRO and WINSTEPS program, each of which came from its own 'tradition'. The results of this study indicate that Rasch models and IFA as a special case of SEM are mathematically equal, as well as the Rasch measurement model, but due to different philosophical perspectives, people might vary in their understanding of this concept. Given the findings of this study, it is expected that confusion and misunderstanding between the three can be overcome.

First Page

Last Page

Issue

Volume

Digital Object Identifier (DOI)

10.21831/pep.v24i1.29871

Recommended Citation

Hayat, Bahrul; Putra, Muhammad Dwirifqi Kharisma; and Suryadi, Bambang (2020) "Comparing item parameter estimates and fit statistics of the Rasch model from three different traditions," Jurnal Penelitian dan Evaluasi Pendidikan: Vol. 24: Iss. 1, Article 4.
DOI: 10.21831/pep.v24i1.29871
Available at: https://scholarhub.uny.ac.id/jpep/vol24/iss1/4

References

Andrich, D. (2011). Rating scales and Rasch measurement. Expert Review of Pharmacoeconomics & Outcomes Research, 11(5), 571-585. https://doi.org/10.1586/erp.11.59

Bastari, B. (2000). Linking multiple-choice and constructed-response items to a common proficiency scale. Doctoral dissertation. University of Massachusetts, Amherst.

Bock, R. Darrell. (2005). A brief history of Item Theory Response. Educational Measurement: Issues and Practice, 16(4), 21-33. https://doi.org/10.1111/j.1745-3992.1997.tb00605.x

Bock, R. Darrell, & Aitkin, M. (1981). Marginal maximum likelihood estimation of item parameters: Application of an EM algorithm. Psychometrika, 46(4), 443-459. https://doi.org/10.1007/BF02293801

Bock, R D, & Wood, R. (1971). Test theory. Annual Review of Psychology, 22(1), 193-224. https://doi.org/10.1146/annurev.ps.22.020171.001205

Brown, T. A. (2015). Confirmatory factor analysis for applied research (2nd ed.). Guilford Press.

Cai, L. (2013). Factor analysis of tests and items. In K. F. Geisinger, B. A. Bracken, J. F. Carlson, J.-I. C. Hansen, N. R. Kuncel, S. P. Reise, & M. C. Rodriguez (Eds.), APA handbooks in psychology: APA handbook of testing and assessment in psychology, Vol. 1. Test theory and testing and assessment in industrial and organizational psychology (pp. 85-100). American Psychological Association.

Cai, Li, & Hansen, M. (2013). Limited-information goodness-of-fit testing of hierarchical item factor models. British Journal of Mathematical and Statistical Psychology, 66(2), 245-276. https://doi.org/10.1111/j.2044-8317.2012.02050.x

Carlson, J. E., & von Davier, M. (2013). Item response theory (Research Report 13-28). Educational Testing Service.

Choppin, B. (1983). The Rasch model for item analysis (CSE Report No. 219).

Curran, P. J., Cole, V., Bauer, D. J., Hussong, A. M., & Gottfredson, N. (2016). Improving factor score estimation through the use of bbserved background characteristics. Structural Equation Modeling: A Multidisciplinary Journal, 23(6), 827-844. https://doi.org/10.1080/10705511.2016.1220839

de Ayala, R. J. (2009). The theory and practice of Item Response Theory. Guilford Press.

Depaoli, S., Tiemensma, J., & Felt, J. M. (2018). Assessment of health surveys: Fitting a multidimensional graded response model. Psychology, Health & Medicine, 23(sup1), 13-31. https://doi.org/10.1080/13548506.2018.1447136

Embretson, S. E., & Reise, S. P. (2000). Item response theory for psychologists. Lawrence Erlbaum Associates.

Fox, J.-P. (2010). Bayesian item response modeling: Theory and applications. Springer Science+Business Media.

Hayat, B. (1992). A mathematics item bank for Indonesia. Doctoral dissertation. University of Chicago, Chicago, IL.

Kamata, A., & Bauer, D. J. (2008). A note on the relation between factor analytic and item response theory models. Structural Equation Modeling: A Multidisciplinary Journal, 15(1), 136-153. https://doi.org/10.1080/10705510701758406

Karabatsos, G. (2000). A critique of Rasch residual fit statistics. Journal of Applied Measurement, 1(2), 152-176.

Linacre, J. M. (2010). Two perspectives on the application of Rasch models. European Journal of Physical and Rehabilitation Medicine, 46(2), 309-310. https://www.minervamedica.it/en/journals/europa-medicophysica/article.php?cod=R33Y2010N02A0309

Linacre, J. M. (2018). Winsteps® Rasch measurement computer program: User's guide. Winsteps.com.

Little, T. D. (2018). Core principles of life course health development methodology and analytics. In N. Halfon, C. B. Forrest, R. M. Lerner, & E. M. Faustman (Eds.), Handbook of life course health development (pp. 523-540). Springer Nature.

Lord, F. (1952). A theory of test scores (Psychometric monograph no. 7). Psychometric Corporation. https://www.psychometricsociety.org/sites/main/files/file-attachments/mn07.pdf?1576607452

MacCallum, R. C., Browne, M. W., & Sugawara, H. M. (1996). Power analysis and determination of sample size for covariance structure modeling. Psychological Methods, 1(2), 130-149. https://doi.org/10.1037/1082-989X.1.2.130

Mair, P. (2018). Modern psychometric with R. Springer International.

Masters, G. N. (1988). Item discrimination: When more is worse. Journal of Educational Measurement, 25(1), 15-29. https://doi.org/10.1111/j.1745-3984.1988.tb00288.x

Maydeu-Olivares, A. (2005). Linear item response theory, nonlinear item response theory, and factor analysis: A unified framework. In A. Maydeu-Olivares & J. J. McArdle (Eds.), Contemporary psychometrics: A festschrift for Roderick P. McDonald (pp. 73-100). Lawrence Erlbaum Associates.

Maydeu-Olivares, Albert, & Joe, H. (2006). Limited information goodness-of-fit testing in multidimensional contingency tables. Psychometrika, 71(4), 713-732. https://doi.org/10.1007/s11336-005-1295-9

Maydeu-Olivares, Alberto, & Montaño, R. (2013). How should we assess the fit of Rasch-type models? Approximating the power of goodness-of-fit statistics in categorical data analysis. Psychometrika, 78(1), 116-133. https://doi.org/10.1007/s11336-012-9293-1

Mislevy, R. J. (2018). Sociocognitive foundations of educational measurement. Routledge.

Muthén, B. (1988). Some uses of structural equation modeling in validity studies: Extending IRT to external variables. In H. Wainer & H. Braun (Eds.), Test validity (pp. 213-238). Lawrence Erlbaum Associates.

Muthen, B. O., Kao, C.-F., & Burstein, L. (1991). Instructionally sensitive psychometrics: Application of a new IRT-based detection technique to mathematics achievement test items. Journal of Educational Measurement, 28(1), 1-22. https://doi.org/10.1111/j.1745-3984.1991.tb00340.x

Nasoetion, N., Djalil, A., Musa, I., Soelistyo, S., Choppin, B. H., & Postlethwaithe, T. N. (1976). The development of educational evaluation models in Indonesia. Office of Educational and Cultural Research and Development (BP3K) of the Ministry of Education and Culture.

Orlando, M., & Thissen, D. (2000). Likelihood-based item-fit indices for dichotomous item response theory models. Applied Psychological Measurement, 24(1), 50-64. https://doi.org/10.1177/01466216000241003

Orlando, M., & Thissen, D. (2003). Further investigation of the performance of S - X2: An item fit index for use with dichotomous item response theory models. Applied Psychological Measurement, 27(4), 289-298. https://doi.org/10.1177/0146621603027004004

Paek, I., & Cole, K. (2020). Using R for item response theory model applications. Routledge.

Paek, Insu, Cui, M., Ã–ztürk GübeÅŸ, N., & Yang, Y. (2018). Estimation of an IRT model by Mplus for dichotomously scored responses under different estimation methods. Educational and Psychological Measurement, 78(4), 569-588. https://doi.org/10.1177/0013164417715738

Preacher, K. J., & Selig, J. P. (2012). Advantages of Monte Carlo confidence intervals for indirect effects. Communication Methods and Measures, 6(2), 77-98. https://doi.org/10.1080/19312458.2012.679848

Rasch, G. (1960). Probabilistic models for some intelligence and attainment tests. Danish Institute for Educational Research.

Takane, Y., & de Leeuw, J. (1987). On the relationship between item response theory and factor analysis of discretized variables. Psychometrika, 52(3), 393-408. https://doi.org/10.1007/BF02294363

Umar, J. (1987). Robustness of the simple linking procedure in item banking using the Rasch model. Doctoral dissertation. University of California, Los Angeles.

Wihardini, D. (2016). An investigation of the relationship of student performance to their opportunity-to-learn in PISA 2012 mathematics: The case of Indonesia. Doctoral dissertation. University of California, Berkeley.

Wright, B. D. (1968). Sample-free test calibration and person measurement. Proceedings of the 1967 Invitational Conference of Testing Problems.

Wright, B. D., & Mok, M. M. C. (2004). An overview of the family of Rasch measurement models. In E. V. Smith & R. M. Smith (Eds.), Introduction to Rasch measurement: Theory, models and applications. JAM Press.

Wright, B. D., & Stone, M. H. (1979). Best test design. Mesa Press.

Wright, Benjamin D. (2005). A history of social science measurement. Educational Measurement: Issues and Practice, 16(4), 33-45. https://doi.org/10.1111/j.1745-3992.1997.tb00606.x

Download

Included in

Educational Assessment, Evaluation, and Research Commons

COinS