•  
  •  
 

Jurnal Penelitian dan Evaluasi Pendidikan

Authors

Wasis Wasis

Keywords

model penskoran partial credit, butir multiple true-false

Document Type

Article

Abstract

Tujuan penelitian ini menghasilkan model penskoran politomus untuk respons butir multiple true-false, sehingga dapat mengestimasi secara lebih akurat kemampuan di bidang fisika. Pengembangan penskoran menggunakan Four-D model dan diuji akurasinya melalui penelitian empiris dan simulasi. Penelitian empiris menggunakan 15 butir multiple true-false yang diambil dari soal UMPTN tahun 1996-2006 dan dikenakan pada 410 mahasiswa baru FMIPA Universitas Negeri Surabaya angkatan tahun 2007. Respons peserta tes diskor dengan tiga model partial credit (PCM I; II; dan III) dan secara dikotomus. Hasil penskoran dianalisis dengan program Quest untuk mendapat-kan estimasi tingkat kesukaran butir (δ) dan estimasi ke-mampuan peserta (θ) untuk menentukan nilai fungsi informasi tes dan kesalahan baku estimasi. Penelitian simulasi mengguna-kan data bangkitan berdasarkan parameter empiris (δ dan θ) memakai program statistik SAS dan akurasi estimasinya di-analisis dengan metode root mean squared error (RMSE). Hasil penelitian ini menunjukkan: (i) Penskoran PCM dengan pem-bobotan mampu mengestimasi kemampuan lebih akurat di-bandingkan tanpa pembobotan maupun secara dikotomus; (ii) Semakin banyak jumlah kategori dalam penskoran partial credit, semakin akurat.

Kata kunci: model penskoran partial credit, butir multiple true-false

____________________________________________________________

THE PARTIAL CREDIT SCORING MODEL FOR THE MULTIPLE TRUE-FALSE BUTIRS IN PHYSICS

Abstract This study is an attempt to overcome the weaknesses. This study aims to produce a polytomous scoring model for responses to multiple true-false butirs in order to get a more accurate estimation of abilities in physics. It adopts the Four-D model and its accuracy is assessed through empirical and simulation studies. The empirical study employed 15 multiple true-false butirs taken from the New Students Entrance Test of State University the year of 1996-2006. It administered to 410 new students enrolled in 2007 of Faculty of Mathematics and Science of Surabaya State University. The testees' responses were scored using the partial credit model (PCM) I; II; and III and also dichotomously scored. The results of the four scoring models were analyzed using the Quest program to obtain the estimation of the butir difficulty level (δ) and that of the testees' abilities (θ). The generating of the simulation data used the SAS statistical program and the estimation accuracy was analyzed by using the root mean squared error (RMSE) method. The results of the study show the following: (i) The scoring with the partial credit model with weighting is capable of estimating abilities more accurate than without weighting and dichotomous scoring; (ii) The more the number of the categories in the partial credit scoring is, the more accurate the result of the ability estimation.

Keywords: partial credit model scoring, multiple true-false butir

First Page

1

Last Page

21

Issue

1

Volume

15

Digital Object Identifier (DOI)

10.21831/pep.v15i1.1085

References

Adams, R. J., & Khoo, S. T. (1996). Quest (program komputer). The inter- active test analysis system. Victoria: ACER.

Baker, J. G., Rounds, J. B., & Zeron, M. A. (2000). A comparison of graded response and rasch partial credit models with subjective well- being. Journal of Educational and Behavioral Statistic, 25(3), 253-270

Bond, T. G., & Fox, C. M. (2007). Applying the rasch model: Fundamental measurement in the human sciences (2nd ed.). Mahwah: Lawrence Erlbaum Associates, Publishers.

Dittendik, Ditjendikdasmen, Depdiknas. (2003). Sistem penilaian kelas SD, SMP, SMA, dan SMK. Jakarta: Dittendik, Ditjendikdasmen, Depdiknas.

Donoghue, J. R. (2005). An empirical examination of the IRT information of polythomously scored reading butirs under the generalized PCM. Journal of Educational Measurement, 31(4), 295-311.

Hambleton, R. K., & Jones, R. W. (tt). Comparison of classical test theory and butir response theory and their applications to test develop- ment. BUTIRS (Instructional Topics in Educational Measurement). Diambil pada tanggal 27 November 2008 dari www.ncme.org/ pubs/butirs/24.pdf.

Hambleton, R. K., Swaminathan, H., & Rogers, H. J. (1991). Fundamentals of butir response theory. London: Sage Publications.

Kumaidi. (1987). An exploratory study of the internal characteristics of the Indonesian public university entrance exam ‘SIPENMARU’: Implications for future test development. PhD thesis, tidak diterbitkan. The University of Iowa, Iowa City, USA.

Kumaidi. (Desember 1988). Studi analitik terhadap karakteristik internal ujian tulis seleksi masuk perguruan tinggi. Makalah disajikan dalam Seminar Nasional Pengkajian Ujian Masuk Perguruan Tinggi Negeri, di Jakarta, 21-24 Desember 1988.

Lin, C. J. (2008). Comparisons between classical test theory and butir response theory in automated assembly of parallel test form. The Journal of Technology, Learning, and Assessment. 6(8), 1-42.

Muraki, E., & Bock, R. D. (1998). Parscale. IRT butir analysis and test scoring for rating-scale data. Chicago: Scientific Software International.

Oosterhof, A. (2003). Developing and using classroom assessments (3th ed.). Upper Saddle River: Merrill Prentice Hall.

Rodriguez, M. C. (2005). Three options are optimal for multiple-choice butirs: A meta-analysis of 80 years of research. Educational Measure- ment: Issues and Practice, Summer, 3-13.

SAS Institute (1999). SAS macro language: Reference version 8. Cary, N. C.: SAS Institute, Inc.

Thiaragajan, S., Semmel, D. S., & Semmel, M. L. (1974). Instructional Development for Training Teachers of Exceptional Children. Minnesota: Indiana University.

Tognolini, J., & Davidson, M. (Juli 2003). How do we operationalise what we value? Some technical chalenges in assessing higher order thinking skills. Makalah disajikan dalam the Natinaonal Roundtable on Assessment Conference pada bulan Juli 2003 di Darwin, Australia.

Wu, B. C. (2003). Scoring multiple true-false butirs: A comparison of summed scores and response pattern scores at butir and test level. Research report. Lanham, Maryland: Educational Resources International Center (ERIC).

Share

COinS