•  
  •  
 

Keywords

Teori respon butir, statistika ekonomi, ujian akhir semester, bank soal, item response theory, economic statistic, final semester exam, item banks

Document Type

Article

Abstract

Penelitian ini bertujuan untuk mendeksripsikan kualitas butir soal ujian akhir semester mata kuliah statistika ekonomi yang dikembangkan oleh Universitas Terbuka (UT) sebagai dasar dalam mengembangkan bank soal yang terkalibrasi menggunakan pendekatan Teori Respons Butir. Penelitian ini merupakan penelitian deskriptif kuantitatif. Sumber data penelitian ini adalah pola jawaban mahasiswa UT yang telah mengikuti ujian akhir semester (UAS) mata kuliah statistika ekonomi selama enam masa ujian, dengan ukuran sampel sebanyak 23334 mahasiswa. Hasil penelitian ini menunjukkan bahwa butir-butir soal ujian akhir semester mata kuliah statistika ekonomi yang dikembangkan UT: (1) terbukti valid secara konstruk, yakni hanya mengukur satu faktor dominan, yaitu kemampuan statistika ekonomi; (2) memiliki kehandalan yang baik dengan nilai koefisien reliabilitas empiris lebih dari 0,70 (koefisien reliabilitas empiris = 0,7335); (3) dari 140 butir soal yang dikalibrasi terdapat 108 butir soal (25 butir soal berkualitas baik atau tanpa revisi dan 83 butir soal berkualitas kurang baik atau perlu revisi) yang layak disimpan dalam bank soal, sedangkan 32 butir soal berkualitas tidak baik; dan (4) mampu memberikan informasi akurat terkait kemampuan statistika ekonomi mahasiswa pada level kemampuan yang tinggi (-1,3 sampai +4,0).

Quality of statistical test bank items (Case study: Final exam instrument of statistics courses in Universitas Terbuka)

Abstract

This study aims to determine the quality of final semester test items of economic statistics course that was developed by Universitas Terbuka (UT) as a basis for developing calibrated item banks using Item Response Theory. This research uses a quantitative descriptive approach. The researcher investigates the answer pattern of the final semester exam (UAS) in the economic statistics course during six periods of the final exams. The sample size in this study was 23334 students. The results of this study indicate that the final semester exam items of economic statistics courses developed by UT: (1) proved to construct valid, i.e. only measure one dominant factor, namely the ability of economic statistics; (2) has good reliability with empirical reliability coefficient values more than 0.70 (empirical reliability coefficient = 0.7335); (3) of the 140 items calibrated there are 108 items (25 items of good quality or without revision and 83 items of poor quality or need to be revised) that are worth keeping in the question bank, while 32 items of quality are not good; and (4) able to provide accurate information related to students' economic statistical abilities at a high level of ability (-1.3 to +4.0)

Page Range

165-176

Issue

2

Volume

6

Digital Object Identifier (DOI)

10.21831/jrpm.v6i2.28900

Source

https://journal.uny.ac.id/index.php/jrpm/article/view/28900

References

Attali, Y., & Bar"Hillel, M. (2003). Guess where: The position of correct answers in multiple"choice test items as a psychometric variable. Journal of Educational Measurement, 40(2), 109-128. https://doi.org/10.1111/j.1745-3984.2003.tb01099.x

Barnard-Brak, L., Lan, W. Y., & Yang, Z. (2018). Differences in mathematics achievement according to opportunity to learn: A 4pL item response theory examination. Studies in Educational Evaluation, 56, 1-7. https://doi.org/10.1016/j.stueduc.2017.11.002

Crocker, L., & Algina, J. (2008). Introduction to classical and modern test theory. Cengage Learning.

Firmansyah, M. A. (2017). Analisis hambatan belajar mahasiswa pada mata kuliah statistika. Jurnal Penelitian Dan Pembelajaran Matematika, 10(2). https://doi.org/10.30870/jppm.v10i2.2036

Hambleton, R. K., Swaminathan, H., & Rogers, H. J. (1991). Fundamentals of item response theory. Sage.

Hulin, C. L., Drasgow, F., & Parsons, C. K. (1983). Item response theory: Application to psychological measurement. Dow Jones-Irwin.

Istiyono, E., Mardapi, D., & Suparno, S. (2014). Pengembangan tes kemampuan berpikir tingkat tinggi fisika (pysthots) peserta didik SMA. Jurnal Penelitian Dan Evaluasi Pendidikan, 18(1), 1-12. https://doi.org/10.21831/pep.v18i1.2120

Kartianom, K., & Mardapi, D. (2018). The utilization of junior high school mathematics national examination data: Conceptual error diagnosis. REiD (Research and Evaluation in Education), 3(2). https://doi.org/10.21831/reid.v3i2.18120

Kartianom, K., & Ndayizeye, O. (2017). What's wrong with the Asian and African Students' mathematics learning achievement? The multilevel PISA 2015 data analysis for Indonesia, Japan, and Algeria. Jurnal Riset Pendidikan Matematika, 4(2), 200-210. https://doi.org/10.21831/jrpm.v4i2.16931

Keeves, J. P., & Alagumalai, S. (1999). New approaches to measurement. Advances in Measurement in Educational Research and Assessment, 23-42.

Kien-Kheng, F., & Idris, N. (2010). A comparative study on statistics competency level using TIMSS data: Are we doing enough? Journal of Mathematics Education, 3(2), 126-138.

Mardapi, D. (2012). Pengukuran penilaian dan evaluasi pendidikan. Nuha Medika.

Mills, J. D., & Holloway, C. E. (2013). The development of statistical literacy skills in the eighth grade: Exploring the TIMSS data to evaluate student achievement and teacher characteristics in the United States. Educational Research and Evaluation, 19(4), 323-345. https://doi.org/10.1080/13803611.2013.771110

Muslim, M., Suhandi, A., & Nugraha, M. G. (2017). Development of reasoning test instruments based on TIMSS framework for measuring reasoning ability of senior high school student on the physics concept. Journal of Physics: Conference Series, 812(1), 012108. https://doi.org/10.1088/1742-6596/812/1/012108

Pey Tee, O., & Subramaniam, R. (2018). Comparative study of middle school students' attitudes towards science: Rasch analysis of entire TIMSS 2011 attitudinal data for England, Singapore and the U.S.A. as well as psychometric properties of attitudes scale. International Journal of Science Education, 40(3), 268-290. https://doi.org/10.1080/09500693.2017.1413717

Ramos, J. L. S., Dolipas, B. B., & Villamor, B. B. (2013). Higher order thinking skills and academic performance in physics of college students: A regression analysis. International Journal of Innovative Interdisciplinary Research, 4(48-60).

Retnawati, H. (2013). Evaluasi program pendidikan. Universitas Terbuka.

Retnawati, H. (2016). Validitas reliabilitas dan karakteristik butir. Parama Publishing.

Retnawati, H. (2017). Diagnosing the junior high school students'difficulties in learning mathematics. International Journal on New Trends in Education and Their Implications, 8(1), 33-50. http://www.ijonte.org/FileUpload/ks63207/File/04.heri_retnawati.pdf

Retnawati, H., & Hadi, S. (2014). Sistem bank soal daerah terkalibrasi untuk menyongsong era desentralisasi. Jurnal Ilmu Pendidikan, 20(2), 183-193. https://doi.org/10.17977/jip.v20i2.4615

Rindermann, H., & Baumeister, A. E. E. (2015). Validating the interpretations of PISA and TIMSS tasks: A rating study. International Journal of Testing, 15(1), 276-296. https://doi.org/10.1080/15305058.2014.966911

Rogers, H. J. (1999). Guessing in multiple choice tests. In Advances in measurement in educational research and assessment (pp. 235-243). Pergamon Press, New York.

Wibawa, S. (2017). Tri Dharma Perguruan Tinggi (Pendidikan dan pengabdian kepada masyarakat). In Disampaikan dalam Rapat Perencanaan Pengawasan Proses Bisnis Perguruan Tinggi Negeri. Yogyakarta (Vol. 29).

Wu, M., Tam, H. P., & Jen, T.-H. (2016). Educational measurement for applied researchers. Springer Singapore. https://doi.org/10.1007/978-981-10-3302-5

Share

COinS