Jurnal Penelitian dan Evaluasi Pendidikan

Document Type



The Covid-19 pandemic is a major challenge for the education system. The face-to-face learning process shifted to online learning, including the school exams. In Aceh province, the school exams have changed from paper-based and computer-based. This research aims to analyze the difficulty index of an item bank based on cognitive aspects of Bloom's Taxonomy. The study samples included 850 students. The data were the item bank of a final semester exam consisting of 200 multiple-choice items, answer keys, and students' answer sheets. The empirical analysis of the item bank using classical test theory (CTT) found that 141 out of 200 items are valid based on content validity and computing data set using the Aiken's V formula. Item tests have reliability of 0.983. The reliability is calculated using the Kuder-Richardson 21 formula. If the reliability coefficient is r11 ‰¥ 0.70, then the item is declared reliable. In addition, 62 out of 141 (43.97%) items from the item bank are classified with a moderate difficulty index, and 79 items (56.03%) are categorized with a high difficulty index. The cognitive aspects found in the items are remembering, understanding, applying, and analyzing. Students mostly found items with the cognitive aspects of remembering and understanding are difficult to solve.

First Page


Last Page






Digital Object Identifier (DOI)



Ali, W. (2020). Online and remote learning in higher education institutes: A necessity in light of COVID-19 pandemic. Higher Education Studies, 10(3), 16-25. https://doi.org/10.5539/hes.v10n3p16

Allen, M. J., & Yen, W. M. (1979). Introduction to measurement theory. Brooks/Cole. https://books.google.co.id/books?id=cgElAQAAIAAJ&hl=id&source=gbs_navlinks_s

Arifin, Z. (2017). Kriteria instrumen dalam suatu penelitian. Jurnal THEOREMS (The Original Research of Mathematics), 2(1), 28-36. https://jurnal.unma.ac.id/index.php/th/article/view/571

bin Abd. Razak, N., bin Khairani, A. Z., & Thien, L. M. (2012). Examining quality of Mathemtics test items using Rasch model: Preminarily analysis. Procedia - Social and Behavioral Sciences, 69, 2205-2214. https://doi.org/10.1016/j.sbspro.2012.12.187

Daniel, S. J. (2020). Education and the COVID-19 pandemic. PROSPECTS, 49(1-2), 91-96. https://doi.org/10.1007/s11125-020-09464-3

DiBattista, D., & Kurzawa, L. (2011). Examination of the quality of multiple-choice items on classroom tests. The Canadian Journal for the Scholarship of Teaching and Learning, 2(2), 4. https://doi.org/10.5206/cjsotl-rcacea.2011.2.4

Escudero, E. B., Reyna, N. L., & Morales, M. R. (2000). The level of difficulty and discrimination power of the Basic Knowledge and Skills Examination (EXHCOBA). Revista Electrónica de Investigación Educativa, 2(1), 11-29. https://redie.uabc.mx/redie/article/view/15

Hingorjo, M. R., & Jaleel, F. (2012). Analysis of one-best MCQs: The difficulty index, discrimination index and distractor efficiency. JPMA. The Journal of the Pakistan Medical Association, 62(2), 142-147. https://jpma.org.pk/article-details/3255

Jiraro, P. (2014). A development of measurement and evaluation standards and item bank approach model for teachers in Thai secondary schools. Procedia - Social and Behavioral Sciences, 116, 547-556. https://doi.org/10.1016/j.sbspro.2014.01.256

Johari, J., Sahari, J., Wahab, D. A., Abdullah, S., Abdullah, S., Omar, M. Z., & Muhamad, N. (2011). Difficulty index of examinations and their relation to the achievement of programme outcomes. Procedia - Social and Behavioral Sciences, 18, 71-80. https://doi.org/10.1016/j.sbspro.2011.05.011

Kibble, J. D., & Johnson, T. (2011). Are faculty predictions or item taxonomies useful for estimating the outcome of multiple-choice examinations? Advances in Physiology Education, 35(4), 396-401. https://doi.org/10.1152/advan.00062.2011

Koçdar, S., Karadağ, N., & Şahin, M. D. (2016). Analysis of the difficulty and discrimination indices of multiple-choice questions according to cognitive levels in an open and distance learning context. Turkish Online Journal of Educational Technology, 15(4), 16-24. http://www.tojet.net/articles/v15i4/1542.pdf

Kumar, D., Jaipurkar, R., Shekhar, A., Sikri, G., & Srinivas, V. (2021). Item analysis of multiple choice questions: A quality assurance test for an assessment tool. Medical Journal Armed Forces India, 77, S85-S89. https://doi.org/10.1016/j.mjafi.2020.11.007

Mardapi, D. (2015). Pengukuran, penilaian, dan evaluasi pendidikan. Nuha Medika. https://opac.perpusnas.go.id/DetailOpac.aspx?id=1162325

Marie, S. M. J. A., & Edannur, S. (2015). Relevance of item analysis in standardizing an achievement test in teaching of Physical Science in B.Ed syllabus. I-Manager's Journal of Educational Technology, 12(3), 30-36. https://doi.org/10.26634/jet.12.3.3743

Mitra, N. K., Nagaraja, H. S., Ponnudurai, G., & Judson, J. P. (2009). The levels of difficulty and discrimination indices in type a multiple choice questions of pre-clinical semester 1 multidisciplinary summative tests. International E-Journal of Science, Medicine & Education, 3(1), 2-7. http://mymedr.afpm.org.my/publications/44080

Nevid, J. S., & McClelland, N. (2013). Using action verbs as learning outcomes: Applying Bloom's taxonomy in measuring instructional objectives in Introductory Psychology. Journal of Education and Training Studies, 1(2), 19-24. https://doi.org/10.11114/jets.v1i2.94

Nitko, A. J. (1996). Educational assessment of students. Merrill Publishing Company. https://books.google.co.id/books/about/Educational_Assessment_of_Students.html?id=CDFLAAAAYAAJ&redir_esc=y

Pande, S. S., Pande, S. R., Parate, V. R., Nikam, A. P., & Agrekar, S. H. (2013). Correlation between difficulty & discrimination indices of MCQs in formative exam in Physiology. South-East Asian Journal of Medical Education, 7(1), 45-50. https://doi.org/10.4038/seajme.v7i1.149

Purnama, D. N., & Alfarisa, F. (2020). Karakteristik butir soal try out Teori Kejuruan Akuntansi SMK berdasarkan teori Tes Klasik dan teori Respons Butir. Jurnal Pendidikan Akuntansi Indonesia, 18(1), 36-46. https://doi.org/10.21831/jpai.v18i1.31457

Quaigrain, K., & Arhin, A. K. (2017). Using reliability and item analysis to evaluate a teacher-developed test in educational measurement and evaluation. Cogent Education, 4(1), 1301013. https://doi.org/10.1080/2331186X.2017.1301013

Rao, C., Kishan Prasad, H., Sajitha, K., Permi, H., & Shetty, J. (2016). Item analysis of multiple choice questions: Assessing an assessment tool in medical students. International Journal of Educational and Psychological Researches, 2(4), 201-204. https://doi.org/10.4103/2395-2296.189670

Salih, K. M. A., Jibo, A., Ishaq, M., Khan, S., Mohammed, O., AL-Shahrani, A., & Abbas, M. (2020). Psychometric analysis of multiple-choice questions in an innovative curriculum in Kingdom of Saudi Arabia. Journal of Family Medicine and Primary Care, 9(7), 3663. https://doi.org/10.4103/jfmpc.jfmpc_358_20

Septiana, N. (2016). Analisis butir soal Ulangan Akhir Semester Biologi tahun pelajaran 2015/2016 kelas X dan XI pada MAN Sampit. Edu Sains: Jurnal Pendidikan Sains Dan Matematika, 4(2), 115-121. https://e-journal.iain-palangkaraya.ac.id/index.php/edusains/article/view/514

Sudjana, N. (2017). Penilaian hasil proses belajar mengajar. Remaja Rosdakarya. https://rosda.co.id/beranda/438-penilaian-hasil-proses-belajar-mengajar.html

Taib, F., & Yusoff, M. S. B. (2014). Difficulty index, discrimination index, sensitivity and specificity of long case and multiple choice questions to predict medical students' examination performance. Journal of Taibah University Medical Sciences, 9(2), 110-114. https://doi.org/10.1016/j.jtumed.2013.12.002

Tan, Y. T., & Othman, A. R. (2013). The relationship between complexity (taxonomy) and difficulty. AIP Conference Proceedings 1522, 596-603. https://doi.org/10.1063/1.4801179

Toquero, C. M. (2020). Challenges and opportunities for higher education amid the COVID-19 pandemic: The Philippine context. Pedagogical Research, 5(4), em0063. https://doi.org/10.29333/pr/7947

Veeravagu, J., Muthusamy, C., Marimuthu, R., & Michael, A. S. (2010). Using Bloom's taxonomy to gauge students' reading comprehension performance. Canadian Social Science, 6(3), 205-212. http://www.cscanada.net/index.php/css/article/view/j.css.1923669720100603.023

Wibawa, E. A. (2019). Karakteristik butir soal tes ujian akhir semester Hukum Bisnis. Jurnal Pendidikan Akuntansi Indonesia, 17(1), 86-96. https://doi.org/10.21831/jpai.v17i1.26339

Zainudin, S., Ahmad, K., Ali, N. M., & Zainal, N. F. A. (2012). Determining course outcomes achievement through examination difficulty index measurement. Procedia - Social and Behavioral Sciences, 59, 270-276. https://doi.org/10.1016/j.sbspro.2012.09.275