REID (Research and Evaluation in Education)


reading comprehension, English proficiency test, item analysis, Rasch model

Document Type



The need to take English as a foreign language proficiency test (known as TOEFL [Test of English Language Proficiency]) has been gaining popularity in Indonesia. The increasing demands for such a test and its expensive cost have reinforced many institutions to develop TOEFL instruments and administer the test internally. However, constructing a test instrument is a complex process that makes conducting item analysis become more challenging. Meanwhile, item analysis is crucial to assess the items' quality. Therefore, this study reported the results of statistically analyzing 20 questions of TOEFL reading comprehension that were analyzed in terms of the test reliability, the item and person fit, and the items' difficulty level. Thirty-eight members of the English Department Students' Association of a state university in West Java participated in this study by taking the reading test. The data were analyzed using the Rasch model by utilizing the Quest program. The results showed that four items (36.8%) did not fulfill the ideal criteria of a valid test because they were too easy and too difficult to be given to the target test takers; thus, they needed to be discarded. Meanwhile, 16 items (63.2%) are of good quality and can be used immediately in the proficiency test, especially to measure reading comprehension skills, because they have fulfilled the standard requirements for a valid test. The findings have provided insight into the importance of item analysis in validating test instruments to improve the test quality for future administrations.

Page Range






Digital Object Identifier (DOI)





Ardiyanti, D. (2016). Aplikasi model Rasch pada pengembangan skala efikasi diri dalam pengambilan keputusan karier siswa. Jurnal Psikologi, 43(3), 248-263. https://doi.org/10.22146/jpsi.17801

Azizah, N., Suseno, M., & Hayat, B. (2022). Item analysis of the rasch model items in the final semester exam indonesian language lesson. World Journal of English Language, 12(1), 15-26. https://doi.org/10.5430/wjel.v12n1p15

Bo, W. V., Fu, M., & Lim, W. Y. (2022). Revisiting English language proficiency and its impact on the academic performance of domestic university students in Singapore. Language Testing, 40(1). https://doi.org/10.1177/02655322211064629

Brown, H. D., & Abeywickrama, P. (2018). Language assessment: Principles and classroom practices (3rd Ed.). Pearson/Longman.

Brown, J. D. (2012). Classical test theory. In G. Fulcher & F. Davidson (Eds.), The Routledge handbook of language testing. Routledge.

Choi, I. C. (2008). The impact of EFL testing on EFL education in Korea. Language Testing, 25(1), 39-62. https://doi.org/10.1177/0265532207083744

Cohen, L., Manion, L., & Morrison, K. (2018). Research methods in education (8th Ed.). Routledge.

Danuwijaya, A. A. (2018). Item analysis of reading comprehension test for post-graduate students. English Review: Journal of English Education, 7(1), 29-40. https://doi.org/10.25134/erjee.v7i1.1493

Delgado-Rico, Carretero-Dios, H., & Ruch, W. (2012). Content validity evidences in test development: An applied perspective. International Journal of Clinical and Health Psychology España, 12(3), 449-460. https://doi.org/10.5167/uzh-64551

Downing, S. M. (2010). Twelve steps for effective test development. In S. M. Downing & S. M. Downing (Eds.), Handbook of test development. Routledge.

ETS TOEFL. (2022). TOEFL iBT® reading section. ETS. https://www.ets.org/toefl/test-takers/ibt/about/content/reading/

ETS TOEFL ITP. (2022). TOEFL ITP® assessment series. ETS. https://www.ets.org/toefl_itp/

Faradillah, A., & Adlina, S. (2021). Validity of critical thinking skills instrument on prospective Mathematics teachers. Jurnal Penelitian Dan Evaluasi Pendidikan, 25(2), 126-137. https://doi.org/10.21831/pep.v25i2.40662

Faradillah, A., & Febriani, L. (2021). Mathematical trauma students' junior high school based on grade and gender. Infinity Journal, 10(1), 53-67. https://doi.org/10.22460/infinity.v10i1.p53-68

Finch, W. H., & French, B. F. (2015). Latent variable modeling with R (W. H. Finch, Ed.; 1st Ed.). Taylor & Francis.

Fulcher, G., & Davidson, F. (2007). Language testing and assessment: An advanced resource book (1st Ed.). Routledge.

Golubovich, J., Tolentino, F., & Papageorgiou, S. (2018). Examining the applications and opinions of the TOEFL ITP® assessment series test scores in three countries. ETS Research Report Series, 2018(1), 1-30. https://doi.org/10.1002/ets2.12231

Habibi, H., Jumadi, J., & Mundilarto, M. (2019). The Rasch-rating scale model to identify learning difficulties of physics students based on self-regulation skills. International Journal of Evaluation and Research in Education, 8(4), 659-665. https://doi.org/10.11591/ijere.v8i4.20292

Hagquist, C., & Andrich, D. (2017). Recent advances in analysis of differential item functioning in health research using the Rasch model. Health and Quality of Life Outcomes, 15(1), 181. https://doi.org/10.1186/s12955-017-0755-0

Hamon, A., & Mesbah, M. (2002). Questionnaire reliability under the Rasch model. In Statistical methods for quality of life studies (pp. 155-168). Springer.

Hayat, B., Dwirifqi, M., Putra, K., & Suryadi, B. (2020). Comparing item parameter estimates and fit statistics of the Rasch model from three different traditions. Jurnal Penelitian Dan Evaluasi Pendidikan, 24(1), 39-50. https://doi.org/10.21831/pep.v24i1

Hingorjo, M. R., & Jaleel, F. (2012). Analysis of one-best MCQs: The difficulty index, discrimination index and distractor efficiency. JPMA-Journal of the Pakistan Medical Association, 62(2), 142-147.

Isnani, I., Utami, W. B., Susongko, P., & Lestiani, H. T. (2019). Estimation of college students' ability on real analysis course using Rasch model. REID (Research and Evaluation in Education), 5(2), 95-102. https://doi.org/10.21831/reid.v5i2.20924

Izard, J. (2005). Trial testing and item analysis in test construction. In K. Ross (Ed.), Quantitative research methods in educational planning. UNESCO International Institute for Educational Planning.

Jannah, R., Hidayat, D. N., Husna, N., & Khasbani, I. (2021). An item analysis on multiple-choice questions: A case of a junior high school English try-out test in Indonesia. Leksika: Jurnal Bahasa, Sastra Dan Pengajarannya, 15(1), 9-17. https://doi.org/10.30595/lks.v15i1.8768

Karjo, C. H., & Ronaldo, D. (2019). The validity of TOEFL as entry and exit college requirements: Students' perception. In Proceedings of the Eleventh Conference on Applied Linguistics (CONAPLIN 2018), 326-330. https://doi.org/10.2991/conaplin-18.2019.277

Kimberlin, C. L., & Winterstein, A. G. (2008). Validity and reliability of measurement instruments used in research. In American Journal of Health-System Pharmacy, 65(23), 2276-2284). https://doi.org/10.2146/ajhp070364

Kunandar, K. (2013). Penilaian autentik: Penilaian hasil belajar peserta didik Kurikulum 2013. Raja Grafindo Persada.

Leung, C. (2022). Language proficiency: from description to prescription and back? Educational Linguistics, 1(1), 56-81. https://doi.org/10.1515/eduling-2021-0006

Lia, R. M., Rusilowati, A., & Isnaeni, W. (2020). NGSS-oriented chemistry test instruments: Validity and reliability analysis with the Rasch model. REID (Research and Evaluation in Education), 6(1), 41-50. https://doi.org/10.21831/reid.v6i1.30112

Maharani, A. V., & Putro, H. N. P. S. (2020). Item analysis of English final semester test. Indonesian Journal of EFL and Linguistics, 5(2), 491-504. https://doi.org/10.21462/ijefl.v5i2.302

Moses, T. (2017). A review of developments and applications in item analysis. In R. Bennett & M. von Davier (Eds.), Methodology of educational measurement and assessment: The methodological, psychological and policy contribution of ETS. Springer Open. https://doi.org/10.1007/978-3-319-58689-2

Mouvet, K., & Taverniers, M. (2022). What is language anyway? A view on teaching English proficiency in higher education. International Journal of TESOL Studies, 4(2), 8-23. https://doi.org/10.46451/ijts.2022.02.02

Muchlisin, M., Mardapi, D., & Setiawati, F. A. (2019). An analysis of Javanese language test characteristic using the Rasch model in R program. REID (Research and Evaluation in Education), 5(1), 61-74. https://doi.org/10.21831/reid.v5i1.23773

Mustafa, F. (2015). Using corpora to design a reliable test instrument for English proficiency assessment. In The 62nd TEFLIN International Conference 2015, 344-352. https://repositori.unud.ac.id/protected/storage/upload/repositori/d6117bc1b9d271bd3f1b3fbee69683cc.pdf

Mustafa, F., & Apriadi, H. (2014). DIY: Designing a reading test as reliable as a paper-based TOEFL design by ETS. In Proceedings of the 1st English Education International Conference (EEIC) in Conjunction with the 2nd Reciprocal Graduate Research Symposium (RGRS) of the Consortium of Asia-Pacific Education Universities (CAPEU), 402-407. http://eeic.unsyiah.ac.id/proceedings/index.php/eeic/article/view/79

Ndayizeye, O. (2017). Discrepancies in assessing undergraduates' pragmatics learning. REID (Research and Evaluation in Education), 3(2), 133-143. https://doi.org/10.21831/reid.v3i2.14487

Ofianto, O. (2018). Analysis of instrument test of historical thinking skills in senior high school history learning with Quest programs. Indonesian Journal of History Education, 6(2), 184-192. https://journal.unnes.ac.id/sju/index.php/ijhe/article/view/27648

Phillips, D. (2001). Longman introductory course for the TOEFL test. Longman.

Pratama, D. (2020). Analisis kualitas tes buatan guru melalui pendekatan Item Response Theory (IRT) model Rasch. Tarbawy : Jurnal Pendidikan Islam, 7(1), 61-70. https://doi.org/10.32923/tarbawy.v7i1.1187

Rahim, A., & Haryanto, H. (2021). Implementation of Item Response Theory (IRT) Rasch model in quality analysis of final exam tests in Mathematics. Journal of Research and Educational Research Evaluation (JERE), 10(2), 57-65. https://doi.org/10.15294/jere.v10i2.51802

Renandya, W. A., Hamied, F. A., & Nurkamto, J. (2018). English language proficiency in Indonesia: Issues and prospects. Journal of Asia TEFL, 15(3), 618-629. https://doi.org/10.18823/asiatefl.2018.

Rizbudiani, A. D., Jaedun, A., Rahim, A., & Nurrahman, A. (2021). Rasch model item response theory (IRT) to analyze the quality of mathematics final semester exam test on system of linear equations in two variables(SLETV). Jurnal Pendidikan Matematika, 12(2), 399-412. http://ejournal.radenintan.ac.id/index.php/al-jabar/index

Sacko, M., & Haidara, Y. (2018). Developing autonomous listening learning materials for university students TOEFL preparation. LingTera, 5(2), 170-178. https://doi.org/10.21831/lt.v5i2.10192

Saswati, R. (2021). Item analysis of reading comprehension test: A study of test scores interpretation. Scope : Journal of English Language Teaching, 6(1), 42-49. https://doi.org/10.30998/scope.v6i1.7675

Setyawarno, D. (2017). Panduan penggunaan program Quest untuk analisis butir soal hasil belajar bahasa model konvergen dan divergen. Universitas Negeri Yogyakarta.

Sugianto, A. (2020). Item analysis of English summative test: EFL teacher-made test. Indonesian EFL Research and Practices, 1(1), 35-54. https://journal.iaima.ac.id/i-efl/article/view/4

Suryani, N. Y., & Khadijah, S. (2021). The effectiveness of virtual classroom in TOEFL preparation. Acitya: Journal of Teaching & Education, 3(2), 198-209. https://doi.org/10.30650/ajte.v3i2.2199

Thu, A. S. (2019). Autonomous learning materials of structure and written expression for TOEFL preparation. LingTera, 6(1), 62-72. https://doi.org/10.21831/lt.v6i1.15919

Thurmond, V. A. (2001). The point of triangulation. Journal of Nursing Scholarship, 33(3), 253-258. https://doi.org/10.1111/j.1547-5069.2001.00253.x

Wahyuni, A., & Kartowagiran, B. (2018). Developing assessment instrument of qirāatul kutub at Islamic boarding school. Jurnal Penelitian Dan Evaluasi Pendidikan, 22(2), 208-218. https://doi.org/10.21831/pep.v22i2.16592

Wright, B. D., & Mok, M. M. C. (2004). An overview of the family of Rasch measurement models. In E. V. Smith Jr. & R. M. Smith (Eds.), Introduction to Rasch measurement theory, models and applications (pp. 1-24). JAM Press.

Yumelking, M. (2019). Test items analysis constructed by EFL teachers of private senior high school in Kupang, Indonesia. International Journal of English Literature and Social Sciences, 4(6), 1746-1752. https://doi.org/10.22161/ijels.46.19