REID (Research and Evaluation in Education)

Gaining a deeper understanding of the meaning of the carelessness parameter in the 4PL IRT model and strategies for estimating it

Timbul Pardede, Universitas Terbuka Indonesia
Agus Santoso, Universitas Terbuka Indonesia
Diki Diki, Universitas Terbuka Indonesia
Heri Retnawati, Universitas Negeri Yogyakarta Indonesia
Ibnu Rafi, Universitas Negeri Yogyakarta Indonesia
Ezi Apino, Universitas Negeri Yogyakarta, Indonesia
Munaya Nikma Rosyada, Universitas Negeri Yogyakarta, Indonesia

Keywords

carelessness parameter, dichotomous IRT, four-parameter logistic model, item response theory

Document Type

Article

Abstract

Three popular models are used to describe the characteristics of the test items and estimate the ability of examinees under the dichotomous IRT model, namely the one-, two-, and three-parameter logistic models. The three-item parameters are discriminating power, difficulty, and pseudo-guessing. In the development of the dichotomous IRT model, carelessness or upper asymptote parameter was proposed, which forms a four-parameter logistic (4PL) model to accommodate a condition where a high-ability examinee gives an incorrect response to a test item when he/she should be able to respond to the test item correctly. However, the carelessness parameter and the 4PL model have not been widely accepted and used due to several factors, and people's understanding of that parameter and strategies for estimating it is still inadequate. Therefore, this study aims to shed light on ideas underlying the 4PL model, the meaning of the carelessness parameter, and strategies used to estimate that parameter based on the extant literature. The focus of this study was then extended to demonstrating practical examples of estimating item and person parameters using the 4PL model using empirical data on responses of 1,000 students from the Indonesia Open University (Universitas Terbuka) on 21 of 30 multiple-choice items on the Business English test, a paper-and-pencil test. We mainly analyzed empirical data using the 'mirt' package in RStudio. We present the analysis results coherently so that IRT users would have a sufficient understanding of the 4PL model and the carelessness parameter, and they can estimate item and person parameters under the 4PL model.

Page Range

86-117

Issue

Volume

Digital Object Identifier (DOI)

10.21831/reid.v9i1.63230

Source

https://journal.uny.ac.id/index.php/reid/article/view/63230

Recommended Citation

Pardede, T., Santoso, A., Diki, D., Retnawati, H., Rafi, I., Apino, E., & Rosyada, M. (2023). Gaining a deeper understanding of the meaning of the carelessness parameter in the 4PL IRT model and strategies for estimating it. REID (Research and Evaluation in Education), 9(1), 86-117. https://doi.org/10.21831/reid.v9i1.63230

References

Adedoyin, O. O., & Mokobi, T. (2013). Using IRT psychometric analysis in examining the quality of junior certificate mathematics multiple choice examination test items. International Journal of Asian Social Science, 3(4), 992-1011. https://archive.aessweb.com/index.php/5007/article/view/2471

Allen, M. J., & Yen, W. M. (1979). Introduction to measurement theory. Brooks/Cole.

Andrich, D. (2004). Controversy and the Rasch model: A characteristic of incompatible paradigms? Medical Care, 42(1), 7-16. https://doi.org/10.1097/01.mlr.0000103528.48582.7c

Antoniou, F., Alkhadim, G., Mouzaki, A., & Simos, P. (2022). A psychometric analysis of Raven's colored progressive matrices: Evaluating guessing and carelessness using the 4PL item response theory model. Journal of Intelligence, 10(1), 1-14. https://doi.org/10.3390/jintelligence10010006

Baker, F. B., & Kim, S.-H. (2017). The basics of item response theory using R. Springer International Publishing. https://doi.org/10.1007/978-3-319-54205-8

Barnard-Brak, L., Lan, W. Y., & Yang, Z. (2018). Differences in mathematics achievement according to opportunity to learn: A 4PL item response theory examination. Studies in Educational Evaluation, 56(1), 1-7. https://doi.org/10.1016/j.stueduc.2017.11.002

Barton, M. A., & Lord, F. M. (1981). An upper asymptote for the three-parameter logistic item-response model (pp. 1-8) [Technical Report]. Educational Testing Service. https://doi.org/10.1002/j.2333-8504.1981.tb01255.x

Battauz, M. (2020). Regularized estimation of the four-parameter logistic model. Psych, 2(4), 269-278. https://doi.org/10.3390/psych2040020

Birnbaum, A. (1968). Some latent trait models and their use in inferring an examinee's ability. In F. M. Lord & M. R. Novick (Eds.), Statistical theories of mental test scores (pp. 397-424). Addison-Wesley.

Bulut, O. (2015). Applying item response theory models to entrance examination for graduate studies: Practical issues and insights. Journal of Measurement and Evaluation in Education and Psychology, 6(2), 313-330. https://doi.org/10.21031/epod.17523

Chalmers, R. P. (2012). mirt: A multidimensional item response theory package for the R environment. Journal of Statistical Software, 48(6), 1-29. https://doi.org/10.18637/jss.v048.i06

Chalmers, R. P. (2023). Package “mirt.“ https://cran.r-project.org/web/packages/mirt/mirt.pdf

Chen, W.-H., & Thissen, D. (1997). Local dependence indexes for item pairs using item response theory. Journal of Educational and Behavioral Statistics, 22(3), 265-289. https://doi.org/10.3102/10769986022003265

Cheng, Y., & Liu, C. (2015). The effect of upper and lower asymptotes of IRT models on computerized adaptive testing. Applied Psychological Measurement, 39(7), 551-565. https://doi.org/10.1177/0146621615585850

Christensen, K. B., Makransky, G., & Horton, M. (2017). Critical values for Yen's Q3: Identification of local dependence in the Rasch model using residual correlations. Applied Psychological Measurement, 41(3), 178-194. https://doi.org/10.1177/0146621616677520

DeMars, C. (2010). Item response theory: Understanding statistics measurement. Oxford University Press.

Desjardins, C. D., & Bulut, O. (2018). Handbook of educational measurement and psychometrics using R. CRC Press.

DiBattista, D., & Kurzawa, L. (2011). Examination of the quality of multiple-choice items on classroom tests. The Canadian Journal for the Scholarship of Teaching and Learning, 2(2), 1-23. https://doi.org/10.5206/cjsotl-rcacea.2011.2.4

DoÄŸruöz, E., & Arikan, Ã‡. A. (2020). Comparison of different ability estimation methods based on 3 and 4PL item response theory. Pamukkale University Journal of Education, 50(1), 50-69. https://doi.org/10.9779/pauefd.585774

Drasgow, F., Levine, M. V., & Williams, E. A. (1985). Appropriateness measurement with polychotomous item response models and standardized indices. British Journal of Mathematical and Statistical Psychology, 38(1), 67-86. https://doi.org/10.1111/j.2044-8317.1985.tb00817.x

Edwards, M. C., Houts, C. R., & Cai, L. (2018). A diagnostic procedure to detect departures from local independence in item response theory models. Psychological Methods, 23(1), 138-149. https://doi.org/10.1037/met0000121

Felt, J. M., Castaneda, R., Tiemensma, J., & Depaoli, S. (2017). Using person fit statistics to detect outliers in survey research. Frontiers in Psychology, 8, 1-9. https://doi.org/10.3389/fpsyg.2017.00863

Georgiev, N. (2008). Item analysis of C, D and E series from Raven's standard progressive matrices with item response theory two-parameter logistic model. Europe's Journal of Psychology, 4(3). https://doi.org/10.5964/ejop.v4i3.431

Haladyna, T. M., & Rodriguez, M. C. (2013). Developing and validating test items. Routledge.

Haladyna, T. M., Rodriguez, M. C., & Stevens, C. (2019). Are multiple-choice items too fat? Applied Measurement in Education, 32(4), 350-364. https://doi.org/10.1080/08957347.2019.1660348

Hambleton, R. K., & Jones, R. W. (1993). Comparison of classical test theory and item response theory and their applications to test development. Educational Measurement: Issues and Practice, 12(3), 38-47. https://doi.org/10.1111/j.1745-3992.1993.tb00543.x

Hambleton, R. K., & Swaminathan, H. (1985). Item response theory: Principles and applications. Springer Science+Business Media.

Hambleton, R. K., Swaminathan, H., & Rogers, H. J. (1991). Fundamentals of item response theory. Sage Publications.

Hattie, J. (1985). Methodology review: Assessing unidimensionality of tests and items. Applied Psychological Measurement, 9(2), 139-164. https://doi.org/10.1177/014662168500900204

Hooper, D., Coughlan, J., & Mullen, M. R. (2008). Structural equation modelling: Guidelines for determining model fit. Electronic Journal of Business Research Methods, 6(1), 53-60. https://academic-publishing.org/index.php/ejbrm/article/view/1224

Houts, C. R., & Edwards, M. C. (2013). The performance of local dependence measures with psychological data. Applied Psychological Measurement, 37(7), 541-562. https://doi.org/10.1177/0146621613491456

Kalkan, Ã–. K. (2022). The comparison of estimation methods for the four-parameter logistic item response theory model. Measurement: Interdisciplinary Research and Perspectives, 20(2), 73-90. https://doi.org/10.1080/15366367.2021.1897398

Kalkan, Ã–. K., & Ã‡uhadar, Ä°. (2020). An evaluation of 4PL IRT and DINA models for estimating pseudo-guessing and slipping parameters. Journal of Measurement and Evaluation in Education and Psychology, 11(2), 131-146. https://doi.org/10.21031/epod.660273

Kubinger, K. D., Holocher-Ertl, S., Reif, M., Hohensinn, C., & Frebort, M. (2010). On minimizing guessing effects on multiple-choice items: Superiority of a two solutions and three distractors item format to a one solution and five distractors item format. International Journal of Selection and Assessment, 18(1), 111-115. https://doi.org/10.1111/j.1468-2389.2010.00493.x

Liao, W.-W., Ho, R.-G., Yen, Y.-C., & Cheng, H.-C. (2012). The four-parameter logistic item response theory model as a robust method of estimating ability despite aberrant responses. Social Behavior and Personality: An International Journal, 40(10), 1679-1694. https://doi.org/10.2224/sbp.2012.40.10.1679

Loken, E., & Rulison, K. L. (2010). Estimation of a four-parameter item response theory model. British Journal of Mathematical and Statistical Psychology, 63(3), 509-525. https://doi.org/10.1348/000711009X474502

Magis, D. (2013). A note on the item information function of the four-parameter logistic model. Applied Psychological Measurement, 37(4), 304-315. https://doi.org/10.1177/0146621613475471

Magno, C. (2009). Demonstrating the difference between classical test theory and item response theory using derived test data. The International Journal of Educational and Psychological Assessment, 1(1), 1-11.

Maydeu-Olivares, A., & Joe, H. (2006). Limited information goodness-of-fit testing in multidimensional contingency tables. Psychometrika, 71(4), 713-732. https://doi.org/10.1007/s11336-005-1295-9

Meijer, R. R., & Tendeiro, J. N. (2018). Unidimensional item response theory. In P. Irwing, T. Booth, & D. J. Hughes (Eds.), The Wiley handbook of psychometric testing: A multidisciplinary reference on survey, scale and test development (Vol. 1, pp. 413-443). John Wiley & Sons. https://doi.org/10.1002/9781118489772.ch15

Merino-Soto, C., Angulo-Ramos, M., Rovira-MillÃ¡n, L. V., & Rosario-HernÃ¡ndez, E. (2023). Psychometric properties of the generalized anxiety disorder-7 (GAD-7) in a sample of workers. Frontiers in Psychiatry, 14, 1-16. https://doi.org/10.3389/fpsyt.2023.999242

Ogasawara, H. (2017). Identified and unidentified cases of the fixed-effects 3- and 4-parameter models in item response theory. Behaviormetrika, 44(2), 405-423. https://doi.org/10.1007/s41237-017-0032-x

Orlando, M., & Thissen, D. (2000). Likelihood-based item-fit indices for dichotomous item response theory models. Applied Psychological Measurement, 24(1), 50-64. https://doi.org/10.1177/01466216000241003

Orlando, M., & Thissen, D. (2003). Further investigation of the performance of S - X2: An item fit index for use with dichotomous item response theory models. Applied Psychological Measurement, 27(4), 289-298. https://doi.org/10.1177/0146621603027004004

Paek, I., & Cole, K. (2020). Using R for item response theory model applications. Routledge.

Posit Team. (2023). RStudio: Integrated development environment for R (2023.6.0.421) [Computer software]. Posit Software, PBC. http://www.posit.co/

Primi, R., Nakano, T. D. C., & Wechsler, S. M. (2018). Using four-parameter item response theory to model human figure drawings. Revista AvaliaÃ§Ã£o PsicolÃ³gica, 17(4), 473-483. https://doi.org/10.15689/ap.2018.1704.7.07

Quaigrain, K., & Arhin, A. K. (2017). Using reliability and item analysis to evaluate a teacher-developed test in educational measurement and evaluation. Cogent Education, 4(1), 1-11. https://doi.org/10.1080/2331186X.2017.1301013

Rafi, I., Retnawati, H., Apino, E., Hadiana, D., Lydiati, I., & Rosyada, M. N. (2023). What might be frequently overlooked is actually still beneficial: Learning from post national-standardized school examination. Pedagogical Research, 8(1), 1-15. https://doi.org/10.29333/pr/12657

Retnawati, H. (2014). Teori respons butir dan penerapannya: Untuk peneliti, praktisi pengukuran dan pengujian, mahasiswa pascasarjana. Nuha Medika.

Retnawati, H. (2016). Analisis kuantitatif instrumen penelitian. Parama Publishing.

Revelle, W. (2023). psych: Procedures for psychological, psychometric, and personality research (R package version 2.3.3) [Computer software]. Northwestern University. https://CRAN.R-project.org/package=psych

Robitzsch, A. (2022). Four-parameter guessing model and related item response models. Mathematical and Computational Applications, 27(6), 1-16. https://doi.org/10.3390/mca27060095

Rulison, K. L., & Loken, E. (2009). I've fallen and I can't get up: Can high ability students recover from early mistakes in CAT? Applied Psychological Measurement, 33(2), 83-101. https://doi.org/10.1177/0146621608324023

Rupp, A. A., & Zumbo, B. D. (2004). A note on how to quantify and report whether IRT parameter invariance holds: When pearson correlations are not enough. Educational and Psychological Measurement, 64(4), 588-599. https://doi.org/10.1177/0013164403261051

Rutkowski, L., von Davier, M., & Rutkowski, D. (Eds.). (2014). Handbook of international large-scale assessment: Background, technical issues, and methods of data analysis. CRC Press.

Santoso, A., Pardede, T., Apino, E., Djidu, H., Rafi, I., Rosyada, M. N., Retnawati, H., & Kassymova, G. K. (2022). Polytomous scoring correction and its effect on the model fit: A case of item response theory analysis utilizing R. Psychology, Evaluation, and Technology in Educational Research, 5(1), 1-13. https://doi.org/10.33292/petier.v5i1.148

Schober, P., Boer, C., & Schwarte, L. A. (2018). Correlation coefficients: Appropriate use and interpretation. Anesthesia & Analgesia, 126(5), 1763-1768. https://doi.org/10.1213/ANE.0000000000002864

Slocum-Gori, S. L., & Zumbo, B. D. (2011). Assessing the unidimensionality of psychological scales: Using multiple criteria from factor analysis. Social Indicators Research, 102(3), 443-461. https://doi.org/10.1007/s11205-010-9682-8

Waller, N. G., & Feuerstahler, L. (2017). Bayesian modal estimation of the four-parameter item response model in real, realistic, and idealized data sets. Multivariate Behavioral Research, 52(3), 350-370. https://doi.org/10.1080/00273171.2017.1292893

Willse, J. T. (2018). CTT: Classical test theory functions (R package version 2.3.3) [Computer software]. https://CRAN.R-project.org/package=CTT

Yen, W. M. (1984). Effects of local item dependence on the fit and equating performance of the three-parameter logistic model. Applied Psychological Measurement, 8(2), 125-145. https://doi.org/10.1177/014662168400800201

Yen, Y.-C., Ho, R.-G., Laio, W.-W., Chen, L.-J., & Kuo, C.-C. (2012). An empirical evaluation of the slip correction in the four parameter logistic models with computerized adaptive testing. Applied Psychological Measurement, 36(2), 75-87. https://doi.org/10.1177/0146621611432862

Zanon, C., Hutz, C. S., Yoo, H., & Hambleton, R. K. (2016). An application of item response theory to psychological test development. Psicologia: Reflexao e Critica, 29(1), 1-10. https://doi.org/10.1186/s41155-016-0040-x

Download

Included in

Educational Assessment, Evaluation, and Research Commons

COinS