REID (Research and Evaluation in Education)


instrument development, test, historical thinking skills, polytomous, PCM

Document Type



This study was conducted to produce a model and instruments of historical thinking skills in the history subject at the senior high school (SHS) and to identify SHS students' historical thinking skills. The study was conducted in two stages, namely model development and instrument development altogether with a small-scale tryout and a large-scale tryout. The test for each tryout consisted of six and five sub-test sets. Each test set contained 20 anchor items. The sample for each tryout comprised 1573 and 2613 testees. The data was analyzed by means of Partial Credit Model (PCM) using the QUEST program. The overall tryout results indicate that, based on the criteria for an INFIT MNSQ mean of 0.1 and a standard deviation of 1.0, the tests fit the PCM. The reliability coefficients of the tests for the tryouts are moderately good; the Cronbach's alpha coefficients are, respectively, 0.65 and 0.54. The lowest score of historical thinking skills is -.352 and the highest is +1.21 in an ideal range of -4.0 to +4.0. In overall, the testees' scores are not satisfactory. Only 5.89% of the testees are above the expected median.

Page Range






Digital Object Identifier (DOI)





Allen, M. J. & Yen, W. M. (1979). Introduction to measurement theory. Belmont, CA: Wadsworth, Inc.

Ashby, R., Lee, P. J. & Shemit, D. (2005). Putting principles into practice: teaching and planning. In M.S. Donovan & J.D. Bransford (Eds.). How students learn: History, mathematics, and science in the classroom. Washington, DC: The National Academies Press.

Bain, R. B. (2005). Applying the principles of how people learning teaching high school history. In M.S. Donovan & J.D. Bransford (Eds.). How students learn: History, mathematics, and science in the classroom. Washington, DC: The Natio-nal Academies Press.

Barton, K. C. & Levstik, L. S. (2003). Why don’t more history teachers engage students in interpretation?. Research and Practice Social Education, 67 (6), pp. 358-361.

Borg, W. R. & Gall, M. D. (1989). Educational research: An introduction (5th ed.). New York, NY: Longman.

Departemen Pendidikan Nasional (Depdiknas). (2007). Peraturan Menteri Pendidikan Nasional Republik Indonesia Nomor 20, Tahun 2007, tentang Standar Penilaian Pendidikan untuk Satuan Pendidikan Dasar dan Menegah [Indonesian National Education Minister’s regulation number 20, in the year of 2007, about the standard of educational assessment for primary and secondary education].

Fogu, C. (2009). Digitalizing historical consciousness. Journal History and Theory, 47 (1), pp. 103-121.

Griffin, P. & Nix, P. (1991). Educational assessment and reporting: A new ap-proach. Sydney: Harcourt Brace Jovanovich, Publishers.

Hambleton, R. K. & Swaminathan, H. (1985). Item respons theory. Boston, MA: Kluwer Inc.

Hargreaves, A., Earl, L. & Schmidt, M. (2002). Perspectives on alternative assesment reform. American Educational Research Journal, 39 (1), pp. 69-95.

Keeves, J. P. & Master, G. N. (1999). Introduction. In G. N. Masters & J.

P. Keeves (Eds.). Advances in measurement in education research and assess-ment. Amsterdam: Pergamon, An imprint of Elsevier Science.

Lee, P. (2005). Putting principles into practice: understanding history. In M. S. Donovan & J. D. Bransford (Eds.). How students learn: History, mathematics, and science in the classroom. Washington, DC: The National Academies Press.

Mardapi, D. (1999). Estimasi kesalahan pengukuran dalam bidang pendidikan dan implikasinya pada ujian nasional [The estimation of miss-assessment in educational field and its implication to national examination]. Proceeded in the inaugural speech of Professor on 4 May

Yogyakarta: Yogyakarta State University.

Mardapi, D. (2008). Teknik penyusunan instrumen tes dan nontes [Technique of test non-test instrument arrangement]. Yogyakarta: Mitra Cendikia Press.

Masters, G. N. (1999). Partial credit model. In J. P. Keeves & G. N. Masters (Eds.). Advances in measurement in educational research and assessment. Amsterdam: Pergamon.

Oriondo, L. L. & Dallo-Antonio (1998). Evaluating educational outcomes (test, measurement, and evaluation) (5th ed.). Quezon City: REX Printing Company.

Rasch, G. (1961). On general laws and the meaning of measurement in

psychology. The Danish Yearbook of Philosophy, 4 (1), pp. 321-334. Rasch, G. (1977). On specific objectivity: An attempt at formalizing the request for generality and validity of scientific statements. The Danish Yearbook of Philosophy, 14 (3), pp. 58-93.

Seixas, P. & Peck, C. (2004). Teaching historical thinking. In A. Sears & I. Wright (Eds.), Challenges and prospects for Canadian social studies. Vancouver: Pacific Educational Press.

Seixas, P. (2013). Linking historical thinking concepts, content and competencies. Vancouver: Pacific Educational Press.

Van der Linden, W. J. & Hambelton, R. K. (1997). Handbook of modern item response theory. New York: Springer.

Winerburg, S. (2006). Berpikir historis: Memetakan masa depan, mengajarkan masa lalu. (M. Maris, Trans.). Jakarta: Yayasan Obor Indonesia.

Wright, B. D. & Masters, G. N. (1982). Rating scale analysis. Chicago: Mesa Press.