•  
  •  
 

Jurnal Penelitian dan Evaluasi Pendidikan

Keywords

a test of biology practicum knowledge (TBPK), GRM, GPCM

Document Type

Article

Abstract

Penelitian ini bertujuan untuk menghasilkan model tes yang cocok dengan data. Pengembangan item pada penelitian menggunakan pendekatan teori respons butir politomus (TRBP). Subjek ujicoba diambil dari siswa lima SMP kelas VII akhir mewakili peringkat SMP di Kota Yogyakarta sebanyak 1030 siswa. Hasil Model TRBP yang cocok dipilih berdasarkan hasil parametrisasi menggunakan PARSCALE dan deskripsi hubungan fungsional antara respons peserta tes dengan tingkat kemampuannya yang dinyatakan dalam test information curves (TIC). Penelitian ini menghasilkan 16 butir untuk bank soal dengan karakteristik masing-masing butir memiliki nilai daya beda yang tidak rendah (>0,25 skala logit) dan nilai kesulitan butir pada selang -3 sampai +3 skala logit. Berdasarkan informasi yang dihasilkan, kedua macam model penskoran GRM dan GPCM cocok memodelkan penskoran TPPB yang diadministrasikan. GPCM mungkin lebih merefleksikan realitas bagaimana data dihasilkan sehingga dari TIC tampak lebih akurat menaksir kemampuan dibanding GRM.

Kata Kunci: tes pengetahuan praktikum biologi, GRM, GPCM

______________________________________________________________

DEVELOPMENT OF A TEST OF BIOLOGY PRACTICUM KNOWLEDGE WITH GRADED RESPONSE AND GENERALIZED PARTIAL CREDIT MODELS

Abstract This study aims to generate information to define the polytomous item response models which are more suitable with the data. The items were developed by the polytomous item response theory approach. The tryout participants were 1030 Year VII students selected from five junior high schools in Yogyakarta City. A suitable model was selected based on the result of PARSCALE parameterization and a description of the functional relationship between the testees' responses and their ability levels indicated by the test information curves (TIC). The study yields 16 items for the item bank in which the discrimination index of each item is > 0.25 logit scale and the difficulty index ranges from -3 to +3 logit scale. The information shows that GRM and GPCM models of are suitable for scoring the administered TBPK. GPCM possibly reflects reality more regarding how the data are yielded so that on the basis of TIC it seems more accurate to estimate students' ability than GRM.

Keywords: a test of biology practicum knowledge (TBPK), GRM, GPCM

First Page

166

Last Page

182

Volume

16

Digital Object Identifier (DOI)

10.21831/pep.v16i0.1111

References

Bastari. (December 1998). Comparison of IRT models that handle dichotomous and polytomous response data simultaneusly. Makalah disajikan di University of Massachusetts.

Boughton, K.A., Klinger, D.A. &; Gierl, M.J. (April 2001). Effect of random rater error on parameter recovery of the generalized partial credit model and graded response model. Paper presented at the annual meeting of the national council on measurement in education, Seattle, WA.

Childs, R.A., &; Wen-Hung Chen. (1999). Software note: Obtaining comparable item parameter estimates in MULTILOG and PARSCALE for two polytomous IRT models [Versi elektronik]. Applied Psychological Measurement, 23, 4, 371-379.

De Ayala, R.J. (1993). An introduction to polytomous item response theory models. Measurement and Evaluation in Conseling and Development, 25, 172-189.

De Mars, C.E. (April, 2002). Recovery of graded response and partial credit parameters in MULTILOG and PARSCALE. 28p. Paper presented at the Annual Meeting of the American Education Research

Association, Chicago.Dodd, B.G., De Ayala, R.J. &; Koch, W.R. (1995). Computerized adaptive testing with polytomous items. [Versi elektronik]. Apllied Psychological Measurement, 19, 5-23.

Dodeen, H. (2004). The relationship between item parameters and item fit. Journal of Educational Measurement. Fall 2004, Vol.41, No.3, pp.261-270.

Hattie, J. (1985). Methodology Review: Assessing unidimensionality of tests and items. [Versi elektronik]. Applied Psychological Measurement, vol 9 (3): 139-164.

Kyong Hee Chon, Won-Chan Lee & Ansley, T.N. (November 2007). Assessing IRT model-data fit for mixed format tests, CASMA Report Number 26, Center for Advanced Studies in Measurement and Assessment.

Lei Chang. (1994). A Psychometric evaluation of 4-poin and 6-point Likert-type scales in relatiopn to reliability and validity. [Versi elektronik]. Apllied Psychological Measurement, 18, 3, 205-215

Muraki, E., & Bock, R.D. (1997). PARSCALE: IRT item analysis and test scoring for rating scale data. Chicago: Scientific Software International.

Nandakumar, R., Feng Yu, Hsin-Hung Li, et al. (1998). Assessing unidimensionality of polytomous data [Versi elektronik]. Applied Psycholohical Measurement, 22, 2, 99-115.

Nina Deng & Hambleton, R.K. (February 4, 2008). Psychometric analyses of the 2006 MCAS high school introductory physics test. Center for Educatinal Assessment Research Report No. 647. Amherrst, MA: Center for Educational Assessment, University of Massachusetts

Ostini, R. &; Nering, M.L. (2006). Polytomous item response theory models, Series: Quantitative application in the social sciences; no. 07-144. Thousand Oaks, CA: Sage.

Reynolds, D.S., Doran, R.L., Allers, R.H. et al. (1996). Alternative assessment in science: A teacher’s guide. New York: New York State Education Departmen University of Buffalo.

Stark, S., Chemyschenko, S., Chuah, D., et al. (2001). Selecting a polytomous IRT model. IRT Modelling Lab. Diambil pada 12 Oktober 2006, dari University of Illinois IRT Laboratoty.htm http://work.psych.uiuc.edu/irt

Tang, K.L. (1996). Polytomous item response theory (IRT) models and their aplications in large-scale testing program: Review of literature. Educational Testing Science. Princeton, NJ. RM-96-8 TOEFL Monograph Series.

Thissen, D., Nelson, L., Rosa, K., et al. (2001). Item response theory for items scored in more than two catagories. Dalam D. Thissen & H. Wainer (Eds.), Test scoring (pp.141-186). Mahwah, NJ: Lawrence Erlbaum Associates.

Ware Jr., J.E., Bjorner, J.B. & Kosinski, M. (2000). Practical implications of item response theory and compterized adaptive testing, A brief summary of ongoing studies of widely used headache impact scales. [Versi elektronik]. Medical Care, 38, 9, II.73-II.82 .

Wells, C.S., Hambleton, R.K. & Urip Purwono. (Juni 2008). Item response theory: Polytomous respons IRT models and application. Handout disampaikan dalam Pelatihan Asesmen Pendidikan dan Psikologi (Psikometri), di Universitas Negeri Yogyakarta

Share

COinS