Tujuan penelitian ini menghasilkan model penskoran politomus untuk respons butir multiple true-false, sehingga dapat mengestimasi secara lebih akurat kemampuan di bidang fisika. Pengembangan penskoran menggunakan Four-D model dan diuji akurasinya melalui penelitian empiris dan simulasi. Penelitian empiris menggunakan 15 butir multiple true-false yang diambil dari soal UMPTN tahun 1996-2006 dan dikenakan pada 410 mahasiswa baru FMIPA Universitas Negeri Surabaya angkatan tahun 2007. Respons peserta tes diskor dengan tiga model partial credit (PCM I; II; dan III) dan secara dikotomus. Hasil penskoran dianalisis dengan program Quest untuk mendapat-kan estimasi tingkat kesukaran butir (δ) dan estimasi ke-mampuan peserta (θ) untuk menentukan nilai fungsi informasi tes dan kesalahan baku estimasi. Penelitian simulasi mengguna-kan data bangkitan berdasarkan parameter empiris (δ dan θ) memakai program statistik SAS dan akurasi estimasinya di-analisis dengan metode root mean squared error (RMSE). Hasil penelitian ini menunjukkan: (i) Penskoran PCM dengan pem-bobotan mampu mengestimasi kemampuan lebih akurat di-bandingkan tanpa pembobotan maupun secara dikotomus; (ii) Semakin banyak jumlah kategori dalam penskoran partial credit, semakin akurat.

Kata kunci: model penskoran partial credit, butir multiple true-false



Abstract This study is an attempt to overcome the weaknesses. This study aims to produce a polytomous scoring model for responses to multiple true-false butirs in order to get a more accurate estimation of abilities in physics. It adopts the Four-D model and its accuracy is assessed through empirical and simulation studies. The empirical study employed 15 multiple true-false butirs taken from the New Students Entrance Test of State University the year of 1996–2006. It administered to 410 new students enrolled in 2007 of Faculty of Mathematics and Science of Surabaya State University. The testees’ responses were scored using the partial credit model (PCM) I; II; and III and also dichotomously scored. The results of the four scoring models were analyzed using the Quest program to obtain the estimation of the butir difficulty level (δ) and that of the testees’ abilities (θ). The generating of the simulation data used the SAS statistical program and the estimation accuracy was analyzed by using the root mean squared error (RMSE) method. The results of the study show the following: (i) The scoring with the partial credit model with weighting is capable of estimating abilities more accurate than without weighting and dichotomous scoring; (ii) The more the number of the categories in the partial credit scoring is, the more accurate the result of the ability estimation.

Keywords: partial credit model scoring, multiple true-false butir


model penskoran partial credit; butir multiple true-false

