Psychometric quality of multiple-choice tests under classical test theory (CTT): AnBuso, Iteman, and R

Siti Nurjanah, Universitas Negeri Yogyakarta, Indonesia
Muhammad Iqbal, Universitas Negeri Yogyakarta, Indonesia
Zafrullah Zafrullah, Universitas Negeri Yogyakarta, Indonesia
Muhammad Naim Mahmud, Universitas Muslim Buton, Indonesia
D’aquinaldo Stefanus Fani Seran, Universitas Negeri Yogyakarta, Indonesia
Izzul Kiram Suardi, Universitas Negeri Yogyakarta, Indonesia
Lovieanta Arriza, Universitas Negeri Yogyakarta, Indonesia

Abstract


Psychometric quality analysis of psychological instruments was important to ensure credible measurement. This study aims to compare the psychometric quality analysis of multiple-choice test items using three different applications to evaluate the advantages and disadvantages of the features provided in supporting classical test theory analysis. This study used a quantitative approach by analysing dichotomous data from 50 participants of a 30-item multiple-choice test. The data were analysed using three applications (AnBuso, Iteman, and R) to compare the statistical output of the main psychometric parameters of the classical test theory, such as difficulty index, discrimination index, and distractor effectiveness. Data analysis was conducted descriptively and quantitatively by comparing the features provided by the application in support of classical test theory analysis to evaluate the advantages and disadvantages of each application. The study found that all three applications produced similar results for the difficulty index, distractor effectiveness, and discrimination index. AnBuso proved user-friendly but limited in capacity, Iteman offered comprehensive output with restricted free functionality, and R provided flexibility but required programming expertise. The application demonstrated unique strengths that are suitable for different research needs and user proficiencies. The choice of application should consider factors such as analysis complexity, sample size, and user expertise. Further research into paid options and diverse test conditions is recommended for a more comprehensive evaluation of these applications in classical test theory analysis.


Keywords


anbuso; iteman; rstudio; classical test theory; psychometric quality analysis

Full Text:

PDF

References


Abedalaziz, N., & Leng, C. H. (2013). The relationship between CTT and IRT approaches in analyzing item characteristics. Malaysian Online Journal of Educational Sciences, 1(1), 64–70.

Am, M. A., Setiawati, F. A., Hadi, S., & Istiyono, E. (2023). Psychometric properties career of commitment instrument using classical test theory and graded response model. Journal of Pedagogical Sociology and Psychology, 5(2), 26–40.

Amiruddin, B. J., & Langamin, M. A. (2022). Ability estimation using the classical test theory and three-parameter item response theory model. Psych Educ, September. https://doi.org/10.5281/zenodo.7063805

Arriza, L., Retnawati, H., & Ayuni, R. T. (2024). Item analysis of high school specialization mathematics exam questions with item response theory approach. Barekeng: Journal of Mathematics and Its Application, 18(1), 151–162. https://doi.org/10.30598/barekengvol18iss1pp0151-0162

Ashraf, Z. A., & Author, C. (2020). Classical and modern methods in item analysis of test tools. International Journal of Research and Review (Ijrrjournal.Com), 7(5), 5.

Awopeju, O., & Afolabi, E. (2016). Comparative analysis of classical test theory and item response theory based item parameter estimates of senior school certificate mathematics examination. European Scientific Journal, 12. https://doi.org/10.19044/esj.2016.v12n28p263

Ayanwale, M. A., Chere-Masopha, J., & Morena, M. C. (2022). The classical test or item response measurement theory: The status of the framework at the examination council of lesotho. International Journal of Learning, Teaching and Educational Research, 21(8), 384–406. https://doi.org/10.26803/ijlter.21.8.22

Aybek, E. C. (2023). The relation of item difficulty between classical test theory and item response theory: Computerized adaptive test perspective. Journal of Measurement and Evaluation in Education and Psychology, 14(2), 118–127. https://doi.org/10.21031/epod.1209284

Berk, R. A., & Griesemer, H. A. (1976). Iteman: An item analysis program for tests, questionnaires, and scales. Educational and Psychological Measurement, 36(1), 189–191. https://doi.org/10.1177/001316447603600122

Cappelleri, J. C., Jason Lundy, J., & Hays, R. D. (2014). Overview of classical test theory and item response theory for the quantitative assessment of items in developing patient-reported outcomes measures. Clinical Therapeutics, 36(5), 648–662. https://doi.org/10.1016/j.clinthera.2014.04.006

Chauhan, G. R., Chauhan, B. R., Vaza, J. V, & Chauhan, P. R. (2023). Relations of the number of functioning distractors with the item difficulty index and the item discrimination power in the multiple choice questions. Cureus, 15(7). https://doi.org/10.7759/cureus.42492

Cobbinah, A., & Ntumi, S. (2022). Difficulty, Discrimination and Pseudo-Guessing Indices of West African Examinations Council Core Mathematics Multiple Choice Items: Theoretical and Practical Implications of Using Item Response Theory. Journal of Research in Educational Sciences, 13(15), 51–60.

Das, R. R., & Richman, R. (2022). The development and application of a public energy literacy instrument. Canadian Journal of Science, Mathematics and Technology Education, 22(1), 42–67.

Ebel, R. L., & Frisbie, D. A. (1991). Frisbie, essentials of educational measurement 5th edition. Prentice-Hall.

Eleje, L. I., Onah, F. E., & Abanobi, C. C. (2019). Comparative study of classical test theory and item response theory using item analysis results of quantitative chemistry achievement test. The African Journal of Behavioural and Scale Development Research, 1(1), 26–36. https://doi.org/10.58579/ajb-sdr/1.1.2019.26

Gierl, M. J., Bulut, O., Guo, Q., & Zhang, X. (2017). Developing, Analyzing, and Using Distractors for Multiple-Choice Tests in Education: A Comprehensive Review. In Review of Educational Research (Vol. 87, Issue 6). https://doi.org/10.3102/0034654317726529

Guyer, R., & Thompson, N. A. (2013). User’s manual for Iteman 4.3 (Issue June). Assessment Systems Corporation.

Henning, G. (1987). A guide to languange testing: Development evaluation research. CreateSpace Independent Publishing Platform.

Hingorjo, M. R., & Jaleel, F. (2012). Analysis of one-best MCQs: The difficulty index, discrimination index and distractor efficiency. Journal of the Pakistan Medical Association, 62(2), 142–147.

Hu, Z., Lin, L., Wang, Y., & Li, J. (2021). The integration of classical testing theory and item response theory. Psychology, 12, 1397–1409. https://doi.org/10.4236/psych.2021.129088

Irawati, R., Ekawati, E. Y., & Budiawanti, S. (2020). Analisis butir soal ujian akhir semester gasal menggunakan program Anbuso di SMA Negeri 1 Boyolali tahun ajaran 2019/2020 (Analysis of odd semester final exam questions using the Anbuso program at SMA Negeri 1 Boyolali 2019/2020 academic year). Jurnal Materi Dan Pembelajaran Fisika, 10(1), 11. https://doi.org/10.20961/jmpf.v10i1.42084

Jatnika, R., Purwono, U., Djunaidi, A., & Haffas, M. (2020). The effect of psychometric analysis series training on the item analysis ability of high school teachers in Bandung. Journal of Physics: Conference Series, 1477(4), 42052.

Kirya, K. R., Mashood, K. K., & Yadav, L. L. (2023). Development of a circular motion concept inventory for use in ugandan science education. Journal of Turkish Science Education, 20(1).

Mahmud, M. N. (2021). Diagnostik kesulitan belajar Matematika siswa SMP kelas VIII di Kota Baubau menggunakan soal-soal model TIMSS (Diagnostics of mathematics learning difficulties for class VIII junior high school students in Baubau City using TIMSS model questions). Yogyakarta State University.

Marfu’ah, S., Masrukan, M. S., & Walid, S. P. (2023). Analysis of mathematical reasoning ability in view of self confidence in the project based learning model with performance assessment.

Mariana, M., Lessy, D., Riaddin, D., Hardiansyah, M. R., & Pary, C. (2023). Analysis of item characteristics of natural sciences national examinations for junior high school based on the classical test theory approach. Jurnal Penelitian Pendidikan IPA, 9(10), 7837–7844. https://doi.org/10.29303/jppipa.v9i10.3698

Mclaughlin, D. (2015). ITEMAN : An item analysis and scoring program. Applied Psychological Measurement, 6(1), 2015. https://doi.org/10.1177/014662168200600105

Monamodi, K. E. E. (2016). The invariance of Item Response Theory (IRT) parameter estimates and Classical Test Theory (CTT) statistics. International Journal of Research in Social Sciences, 6(8), 715–737.

Muhson, Ali,. Lestari, Barkah,. Supriyanto., &, Baroroh, K. (2013). Pengembangan software AnBuso sebagai solusi alternatif bagi guru dalam melakukan analisis butir soal secara praktis dan aplikatif.

Muhson, A. (2017). Penggunaan AnBuso (analisis butir soal) versi 8.0 (Use of AnBuso (question item analysis) version 8.0). Yogyakarta: Universitas Negeri Yogyakarta.

Muhson, A., Lestari, B., & Baroroh, K. (2017). The development of practical item analysis program for Indonesian teachers. International Journal of Instruction, 10(2), 199–210.

Pradani, R. A., & Efendi, A. (2023). Analysis of school exam questions using the Iteman program. Indonesian Language Education and Literature, 8(2), 275. https://doi.org/10.24235/ileal.v8i2.11002

Pratama, D. (2019). Analysis of Clasical Test Theory (CTT) approach on academic ability test instrument. Jisae: Journal of Indonesian Student Assesment and Evaluation, 5(2), 43–54. https://doi.org/10.21009/jisae.052.05

Raymond, M. R., Stevens, C., & Bucak, S. D. (2019). The optimal number of options for multiple-choice questions on high-stakes tests: application of a revised index for detecting nonfunctional distractors. Advances in Health Sciences Education, 24(1), 141–150. https://doi.org/10.1007/s10459-018-9855-9

Rodriguez, M. C., Kettler, R. J., & Elliott, S. N. (2014). Distractor functioning in modified items for test accessibility. SAGE Open, 4(4). https://doi.org/10.1177/2158244014553586

Rogausch, A., Hofer, R., & Krebs, R. (2010). Rarely selected distractors in high stakes medical multiple-choice examinations and their recognition by item authors: A simulation and survey. BMC Medical Education, 10(1). https://doi.org/10.1186/1472-6920-10-85

Sajjad, M., Iltaf, S., & Khan, R. A. (2020). Nonfunctional distractor analysis: An indicator for quality of multiple choice questions. Pakistan Journal of Medical Sciences, 36(5), 982–986. https://doi.org/10.12669/pjms.36.5.2439

Setiawati, F. A., Amelia, R. N., Sumintono, B., & Purwanta, E. (2023). Study item parameters of classical and modern theory of differential aptitude test: is it comparable? European Journal of Educational Research, 12(2).

Shahmirzadi, N. (2023). Validation of a language center placement test: Differential item functioning. International Journal of Language Testing, 13(1), 1–17.

Shakir, M. A., Shafiq, F., & Khalid, M. N. (2022). Assessment of learning achievement of visually impaired children at primary level. Pakistan Journal of Educational Research and Evaluation (PJERE), 9(2).

Shakurnia, A., Ghafourian, M., Khodadadi, A., Ghadiri, A., Amari, A., & Shariffat, M. (2022). Evaluating functional and non-functional distractors and their relationship with difficulty and discrimination indices in four-option multiple-choice questions. Education in Medicine Journal, 14(4), 55–62. https://doi.org/10.21315/eimj2022.14.4.5

Shanmugam, S. K. S., & Rajoo, M. (2020). Examining the quality of english test items. Malaysian Journal of Learning and Instruction, 17(2), 63–101.

Shen, Y., Lei, P.-W., & Crosson, A. C. (2023). Measuring derivational awareness for Chinese-speaking adolescents. Research Methods in Applied Linguistics, 2(1), 100039.

Sheng, Y. (2019). CTT Package in R. Measurement, 17(4), 211–219. https://doi.org/10.1080/15366367.2019.1600839

Siegert, R. J., Krägeloh, C. U., & Medvedev, O. N. (2022). Classical test theory and the measurement of mindfulness. In Handbook of Assessment in Mindfulness Research (pp. 1–14). Springer International Publishing. https://doi.org/10.1007/978-3-030-77644-2_3-1

Subali, B., Yogyakarta, U. N., Surakarta, U. M., Aminah, N. S., & Maret, U. S. (2021). Modern test theory. 14(1), 647–660.

Tarrant, M., Ware, J., & Mohammed, A. M. (2009). An assessment of functioning and non-functioning distractors in multiple-choice questions: A descriptive analysis. BMC Medical Education, 9(1), 1–8. https://doi.org/10.1186/1472-6920-9-40

Travezaño-Cabrera, A., Vilca, L. W., Quiroz-Becerra, J., Huerta, S. L., Delgado-Vallejos, R., & Caycho-Rodríguez, T. (2022). Meaning of Life Questionnaire (MLQ) in peruvian undergraduate students: Study of its psychometric properties from the perspective of classical test theory (CTT). BMC Psychology, 10(1), 206.

Yuwono, M. R., Aribowo, E. K., Firmansah, F., & Indrayanto, B. (2020). Pelatihan AnBuso, zipgrade, dan google form sebagai alternatif penilaian pembelajaran di era digital (AnBuso, Zipgrade and Google Form training as alternative learning assessments in the digital era). Jurnal Pengabdian Masyarakat, 3(3), 49–60.




DOI: https://doi.org/10.21831/pep.v28i2.71542

Refbacks

  • There are currently no refbacks.




Creative Commons License
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.


Find Jurnal Penelitian dan Evaluasi Pendidikan on:

   

ISSN 2338-6061 (online)    ||    ISSN 2685-7111 (print)

View Journal Penelitian dan Evaluasi Pendidikan Visitor Statistics