Comparing the methods of vertical equating for the math learning achievement tests for junior high school students

Chairun Nisa, Department of Educational Research and Evaluation, Universitas Negeri Yogyakarta, Indonesia
Heri Retnawati, Department of Mathematics Education, Universitas Negeri Yogyakarta, Indonesia


Developing the students’ mathematical ability needs to be carried out to improve the teaching process. This is very important for continuous education. This study aimed to describe: (1) the characteristics of the mathematics achievement tests for grades VII and VIII; (2) the equity constant of the vertical equating result of the mathematics achievement; (3) the accuracy of the mean & mean method, mean and sigma, Haebara characteristics curve, Stocking & Lord characteristics curve methods in the vertical equating of the tests for grades VII and VIII. The data were the students’ scores for the Higher Order Thinking tests collected with the anchor test design. The analysis technique utilized was the descriptive quantitative analysis. The findings of the study show that: (1) the learning achievement tests for grades VII and VIII have the difficulty level (location) in the fair category (0.190 and 0.451), and the discrimination index (slope) in the category of good with the mean of 0.700 and 0.633; (2) the vertical equating result shows an equation of Y’ = 0.88X-0.27 with the mean and mean method, Y’ = 0.19X-0.02 with the mean and sigma method, Y’ = 0.38X-0.12 with the Haebara characteristics curve method, and Y’ = 0.57X-0.18 with the Stocking and Lord characteristics curve; (3) the lowest Root Mean Square Different (RMSD) belongs to the mean and mean method, followed by the Stocking and Lord characteristics curve method, mean and sigma method, and the Haebara characteristics curve method.  


equating method; vertical equating; HOT; mathematics

Full Text:



Antara, A. A. P., & Bastari, B. (2015). Penyetaraan vertikal dengan pendekatan klasik dan item response theory pada siswa sekolah dasar. Jurnal Penelitian Dan Evaluasi Pendidikan, 19(1), 13–24.

Baker, F. B. (2001). The basics of item response theory (2nd ed.). College Park, MD: ERIC Clearinghouse on Assessment and Evaluation.

Baker, F. B., & Al-Karni, A. (1991). A comparison of two procedures for computing IRT equating coefficients. Journal of Educational Measurement, 28(2), 147–162.

Crocker, L. M., & Algina, J. (2008). Introduction to classical and modern test theory. Mason, OH: Cengage Learning.

DeMars, C. (2010). Item response theory: Understanding statistics measurement. New York, NY: Oxford University Press.

Field, A. P. (2000). Discovering statistics using SPSS for Windows: Advanced techniques for the beginner. London: Sage Publications.

Gagné, R. M. (1977). The conditions of learning. New York, NY: Holt, Rinehart, and Winston.

Hambleton, R. K., & Swaminathan, H. (1985). Item responsse theory. Newburg Park, LA: Sage Publication ICC.

Hambleton, R. K., Swaminathan, H., & Rogers, H. J. (1991). Fundamentals of item response theory. Newbury Park, CA: Sage Publications.

Han, K. T. (2009). IRTEQ: Windows application that implements item response theory scaling and equating. Applied Psychological Measurement, 33(6), 491–493.

Istiyono, E. (2016). The application of GPCM on MMC test as a fair alternative assessment model in physics learning. In Proceeding of the 3rd International Conference on Research, Implementation and Education of Mathematics and Science (ICRIEMS), 16-17 May 2017 (pp. 25–30). Yogyakarta: Universitas Negeri Yogyakarta. Retrieved from

Kartono, K. (2008). Penyetaraan tes model campuran butir dikotomus dan politomus pada tes prestasi belajar. Jurnal Penelitian Dan Evaluasi Pendidikan, 12(2), 302–320.

Kilmen, S., & Demirtasli, N. (2012). Comparison of test equating methods based on item response theory according to the sample size and ability distribution. Procedia - Social and Behavioral Sciences, 46, 130–134.

Kolen, M. J., & Brennan, R. L. (1995). Test equating: Methods and practices. New York, NY: Springer New York.

Kolen, M. J., & Brennan, R. L. (2004). Test equating, scaling, and linking: Methods and practices. New York, NY: Springer New York.

Kolen, M. J., & Brennan, R. L. (2014). Test equating, scaling, and linking: Methods and practices. New York, NY: Springer.

Law No. 14 of 2005 of Republic of Indonesia about Teachers and Lecturers (2005).

Law of Republic of Indonesia No. 20 of 2003 on National Education System (2003).

Mardapi, D. (2012). Pengukuran, penilaian, dan evaluasi pendidikan. Yogyakarta: Nuha Medika.

Retnawati, H. (2014). Teori respons butir dan penerapannya: Untuk peneliti, praktisi pengukuran dan pengujian, mahasiswa pascasarjana. Yogyakarta: Nuha Medika.

Stiggins, R. J., & Chappuis, J. (2012). An introduction to student-involved assessment for learning. Boston, MA: Pearson.

Sugeng, S. (2010). Penyetaraan vertikal model kredit parsial soal matematika SMP. Jurnal Penelitian Dan Evaluasi Pendidikan, 14(2), 289–308.

Uysal, I., & Kilmen, S. (2016). Comparison of item response theory test equating methods for mixed format tests. International Online Journal of Educational Sciences, 8(2), 1–11.

Wagiran. (2014). Metode penelitian pendidikan: Teori dan implementasi. Yogyakarta: Deepublish.



  • There are currently no refbacks.

Creative Commons License
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

Find REID (Research and Evaluation in Education) on:


ISSN 2460-6995 (Online)

View REiD Visitor Statistics