An analysis of the mathematics school examination test quality

Hadi Sutrisno, SMP Negeri 1 Tanahmerah Bangkalan Jalan Raya Tanah Merah No.105, Tanah Merah Dajah, Bangkalan, Kabupaten Bangkalan, Jawa Timur 69172, Indonesia


This research aims to describe: (1) the quality of mathematics school examination test of the Junior High School for the academic year 2015/2016 in Kabupaten Bangkalan based on qualitative analysis of tests item, (2) ) the quality of mathematics school examination test of Junior High School for the academic year 2015/2016 in Kabupaten Bangkalan based on quantitative analysis of test items, and (3) the test equating on mathematics school examination test for the academic year of 2015/2016 of Junior High School in Kabupaten Bangkalan. A test is said to be quality if it fulfills validity, reliability, and it has good characteristic. A test is said to be equivalent to another if the test scores of one test can be exchanged with the scores of another test. The data is taken from school examination script accompanied with the answer sheets of students. Qualitative data analysis was conducted with the help of experts judgement. Quantitative data analysis was conducted with Classical Test Theory by Iteman and Item Response Theory by BilogMG. These programs are used to find out the test quality quantitatively. In order to analyze the equivalence between series of tests, item-characteristic curves were used. These curves were drawn by Geogebra. The research result shows: (1) qualitatively, the test plan quality of mathematics school examination test is quite good. Meanwhile, the school exam quality is quite good and not so good, (2) quantitatively, the school examination test quality is good, and (3) equating, based on the item-characteristic curves, the school examination tests is equal.


test quality; qualitative analysis; quantitative analysis; tests equating; classical test theory; item response theory; test characteristic curve

Full Text:



Aiken, L. R. (1980). Content validity and reliability of single items or questionnaires. Educational and Psychological Measurement. Vol. 40, 955-959.

Allen, M. J., & Yen, W. M. (1979). Introduction to measurement theory. Monterey, California USA: Brooks/Cole Publishing Company.

Arikunto, S. (1999). Dasar-dasar evaluasi pendidikan. Jakarta:Bumi Aksara.

Ary, D., Jacobs, L.C., Sorensen, C., & Razavieh, A. (2010). Introduction of research in education 8th edition. Belmont, California USA: Wadsworth.

Attali, Y., & Bar-Hillel, M. (2003). Guess where: the position of correct answers in multiple-choice test items items as a psychometric variable. Journal of Educational Measurement, 40, 2, 109-128.

Budiman, A., & Jailani, J. (2014). Pengembangan instrumen asesmen higher order thinking skill (hots) pada mata pelajaran matematika junior high school kelas VIII semester 1. Jurnal Riset Pendidikan Matematika 1(2). 139-151. Doi:

Chauhan, P., et al. (2015). Relationship between difficulty index and distracter effectiveness in single best-answer stem type multiple choice questions. International Journal of Anatomy and Research, Vol. 3, No. 4, 1607-1610.

DeMars, C. (2010). Item response theory. Oxford, New York USA: Oxford University Press, Inc.

Ebel, R. L & Frisbie, D. A. (1991). Essentials of educational measurement. Englewood Cliffs, New Jersey USA: Prentice-Hall, Inc.

Faremi, Y. A. (2016). Reliability coefficient of multiplechoice and short answer objective test items in basic technology:comparative approach. Journal of Educational Policy and Entrepreneurial Research (JEPER), 3, 3, 59-69.

Gunartha, I W., Kartowagiran, B., & Suardiman, S. (2014). Pengembangan model evaluasi program layanan pendidikan anak usia dini (PAUD). Jurnal Penelitian dan Evaluasi Pendidikan, 18(1), 30-43. Retrived from

Kubiszyn, T., & Borich, G. (2003). Educational testing and measurement, (7th edition). Hoboken, New Jersey USA: John Wiley & Sons, Inc.

Mardapi, D. (2005). Pengembangan instrumen penelitian pendidikan. Yogyakarta: Nuha Litera.

Mardapi, D. (2012). Pengukuran penilaian & evaluasi pendidikan. Yogyakarta: Nuha Litera.

Miller, M.D., Linn, R.L., & Gronlund, N.E. (2009). Measurement and assessment in teaching. Upper Saddle River, New Jersey USA: Pearson.

NCTM. (2000). Principles and standards for school mathematics. Reston, Virginia USA: The National Council of Teachers of Mathematics, Inc.

Nitko, A. J. & Brookhart, S. M. (2007). Educational assessment of students. Englewood Cliffs. New Jersey USA: Prentice-Hall, Inc.

Popham, W. J. (2009). Instruction that measures up. Succsessful teaching in the age of accountability. Alexandria, Virginia USA: ASCD.

Retnawati, H. (2015). The equating of the test of english proficiency (TOEP). International Conference on Education, Psychology and Society (ICEEPS)-1801, 276-287.

Salvia, J., Ysseldyke, J. E., Bolt, S. (2010). Assessment in special and inclusive education. Belmont, CA USA:Wadsworth.

Urbina, S. (2014). Essentials of psychological testing (2nd edition). Haboken,NJ USA:John Wiley & Sons, Inc.

Von Davier, A. A. (2011). Statistical models for test equating, scaling and linking. Princeton, New Jersey USA: Springer



  • There are currently no refbacks.

Copyright (c) 2016 Jurnal Riset Pendidikan Matematika

Creative Commons License
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

Jurnal Riset Pendidikan Matematika indexed by:

Creative Commons License
Jurnal Riset Pendidikan Matematika by is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

All rights reserved. p-ISSN 2356-2684 | e-ISSN 2477-1503

View My Stats