Psychometric quality of multiple-choice tests under Classical Test Theory (CTT): AnBuso, Iteman, and RStudio
Muhammad Iqbal, Universitas Negeri Yogyakarta, Indonesia
Zafrullah Zafrullah, Universitas Negeri Yogyakarta, Indonesia
Muhammad Naim Mahmud, Universitas Muslim Buton, Indonesia
D’aquinaldo Stefanus Fani Seran, Universitas Negeri Yogyakarta, Indonesia
Izzul Kiram Suardi, Universitas Negeri Yogyakarta, Indonesia
Lovieanta Arriza, Universitas Negeri Yogyakarta, Indonesia
Abstract
Psychometric quality analysis of psychological instruments was important to ensure credible measurement. This study aims to compare the psychometric quality analysis of multiple-choice test items using three different applications to evaluate the advantages and disadvantages of the features provided in supporting classical test theory analysis. This study used a quantitative approach by analysing dichotomous data from 50 participants of a 30-item multiple-choice test obtained from secondary sources. The data were analysed using three applications (AnBuso, Iteman, and R) to compare the statistical output of the main psychometric parameters of the classical test theory, such as difficulty index, discrimination index, and distractor effectiveness. Data analysis was conducted descriptively and quantitatively by comparing the features provided by each application in support of classical test theory analysis to evaluate the advantages and disadvantages of each application. AnBuso had advantages in terms of practicality, while Iteman excels in aspects of practicality and tidiness of analysis output. On the other hand, the RStudio has the advantage of having a larger number of items and examinees. However, AnBuso has shortcomings in the statistical output generated, Iteman has limitations in terms of the number of items and examinees, and R has difficulties in building syntax.The study showed significant differences in the discrimination index output of the three applications despite similar difficulty index and distractor effectiveness outputs; further research comparing AnBuso, Iteman, and Rstudio based on their algorithms was recommended for a more thorough evaluation of the advantages and disadvantages of each application in classical test theory analysis.
Keywords
References
Abedalaziz, N., & Leng, C. H. (2013). The relationship between CTT and IRT approaches in analyzing item characteristics. Malaysian Online Journal of Educational Sciences, 1(1), 64–70. file:///D:/kutipan/EJ1086220.pdf
Am, M. A., Setiawati, F. A., Hadi, S., & Istiyono, E. (2023). Psychometric properties career of commitment instrument using classical test theory and graded response model. Journal of Pedagogical Sociology and Psychology, 5(2), 26–40.
Amiruddin, B. J., & Langamin, M. A. (2022). Ability estimation using the classical test theory and three-parameter item response theory model. Psych Educ, September. https://doi.org/10.5281/zenodo.7063805
Ashraf, Z. A., & Author, C. (2020). Classical and modern methods in item analysis of test tools. International Journal of Research and Review (Ijrrjournal.Com), 7(5), 5. file:///D:/kutipan/IJRR0058.pdf
Awopeju, O., & Afolabi, E. (2016). Comparative analysis of classical test theory and item response theory based item parameter estimates of senior school certificate mathematics examination. European Scientific Journal, 12. https://doi.org/10.19044/esj.2016.v12n28p263
Ayanwale, M. A., Chere-Masopha, J., & Morena, M. C. (2022). The classical test or item response measurement theory: The status of the framework at the examination council of lesotho. International Journal of Learning, Teaching and Educational Research, 21(8), 384–406. https://doi.org/10.26803/ijlter.21.8.22
Aybek, E. C. (2023). The relation of item difficulty between classical test theory and item response theory: Computerized adaptive test perspective. Journal of Measurement and Evaluation in Education and Psychology, 14(2), 118–127. https://doi.org/10.21031/epod.1209284
Berk, R. A., & Griesemer, H. A. (1976). Iteman: An item analysis program for tests, questionnaires, and scales. Educational and Psychological Measurement, 36(1), 189–191. https://doi.org/10.1177/001316447603600122
Cappelleri, J. C., Jason Lundy, J., & Hays, R. D. (2014). Overview of classical test theory and item response theory for the quantitative assessment of items in developing patient-reported outcomes measures. Clinical Therapeutics, 36(5), 648–662. https://doi.org/10.1016/j.clinthera.2014.04.006
Chauhan, G. R., Chauhan, B. R., Vaza, J. V, & Chauhan, P. R. (2023). Relations of the number of functioning distractors with the item difficulty index and the item discrimination power in the multiple choice questions. Cureus, 15(7). https://doi.org/10.7759/cureus.42492
Ebel, R. L., & Frisbie, D. A. (1991). Frisbie, essentials of educational measurement 5th edition. Prentice-Hall.
Eleje, L. I., Onah, F. E., & Abanobi, C. C. (2019). Comparative study of classical test theory and item response theory using item analysis results of quantitative chemistry achievement test. The African Journal of Behavioural and Scale Development Research, 1(1), 26–36. https://doi.org/10.58579/ajb-sdr/1.1.2019.26
Gierl, M. J., Bulut, O., Guo, Q., & Zhang, X. (2017). Developing, Analyzing, and Using Distractors for Multiple-Choice Tests in Education: A Comprehensive Review. In Review of Educational Research (Vol. 87, Issue 6). https://doi.org/10.3102/0034654317726529
Guyer, R., & Thompson, N. A. (2013). User’s manual for Iteman 4.3 (Issue June). Assessment Systems Corporation.
Henning, G. (1987). A guide to languange testing: Development evaluation research. CreateSpace Independent Publishing Platform.
Hingorjo, M. R., & Jaleel, F. (2012). Analysis of one-best MCQs: The difficulty index, discrimination index and distractor efficiency. Journal of the Pakistan Medical Association, 62(2), 142–147.
Hu, Z., Lin, L., Wang, Y., & Li, J. (2021). The integration of classical testing theory and item response theory. Psychology, 12, 1397–1409. https://doi.org/10.4236/psych.2021.129088
Irawati, R., Ekawati, E. Y., & Budiawanti, S. (2020). Analisis butir soal ujian akhir semester gasal menggunakan program Anbuso di SMA Negeri 1 Boyolali tahun ajaran 2019/2020 (Analysis of odd semester final exam questions using the Anbuso program at SMA Negeri 1 Boyolali 2019/2020 academic year). Jurnal Materi Dan Pembelajaran Fisika, 10(1), 11. https://doi.org/10.20961/jmpf.v10i1.42084
Jatnika, R., Purwono, U., Djunaidi, A., & Haffas, M. (2020). The effect of psychometric analysis series training on the item analysis ability of high school teachers in Bandung. Journal of Physics: Conference Series, 1477(4), 42052.
Kirya, K. R., Mashood, K. K., & Yadav, L. L. (2023). Development of a circular motion concept inventory for use in ugandan science education. Journal of Turkish Science Education, 20(1).
Mahmud, M. N. (2021). Diagnostik kesulitan belajar Matematika siswa SMP kelas VIII di Kota Baubau menggunakan soal-soal model TIMSS (Diagnostics of mathematics learning difficulties for class VIII junior high school students in Baubau City using TIMSS model questions). Yogyakarta State University.
Marfu’ah, S., Masrukan, M. S., & Walid, S. P. (2023). Analysis of mathematical reasoning ability in view of self confidence in the project based learning model with performance assessment.
Mariana, M., Lessy, D., Riaddin, D., Hardiansyah, M. R., & Pary, C. (2023). Analysis of item characteristics of natural sciences national examinations for junior high school based on the classical test theory approach. Jurnal Penelitian Pendidikan IPA, 9(10), 7837–7844. https://doi.org/10.29303/jppipa.v9i10.3698
Mclaughlin, D. (2015). ITEMAN : An item analysis and scoring program. Applied Psychological Measurement, 6(1), 2015. https://doi.org/10.1177/014662168200600105
Monamodi, K. E. E. (2016). The invariance of Item Response Theory (IRT) parameter estimates and Classical Test Theory (CTT) statistics. International Journal of Research in Social Sciences, 6(8), 715–737.
Muhson, Ali,. Lestari, Barkah,. Supriyanto., &, Baroroh, K. (2013). Pengembangan software AnBuso sebagai solusi alternatif bagi guru dalam melakukan analisis butir soal secara praktis dan aplikatif.
Muhson, A. (2017). Penggunaan AnBuso (analisis butir soal) versi 8.0 (Use of AnBuso (question item analysis) version 8.0). Yogyakarta: Universitas Negeri Yogyakarta.
Muhson, A., Lestari, B., & Baroroh, K. (2017). The development of practical item analysis program for Indonesian teachers. International Journal of Instruction, 10(2), 199–210.
Pradani, R. A., & Efendi, A. (2023). Analysis of school exam questions using the Iteman program. Indonesian Language Education and Literature, 8(2), 275. https://doi.org/10.24235/ileal.v8i2.11002
Pratama, D. (2019). Analysis of Clasical Test Theory (CTT) approach on academic ability test instrument. Jisae: Journal of Indonesian Student Assesment and Evaluation, 5(2), 43–54. https://doi.org/10.21009/jisae.052.05
Raymond, M. R., Stevens, C., & Bucak, S. D. (2019). The optimal number of options for multiple-choice questions on high-stakes tests: application of a revised index for detecting nonfunctional distractors. Advances in Health Sciences Education, 24(1), 141–150. https://doi.org/10.1007/s10459-018-9855-9
Rodriguez, M. C., Kettler, R. J., & Elliott, S. N. (2014). Distractor functioning in modified items for test accessibility. SAGE Open, 4(4). https://doi.org/10.1177/2158244014553586
Rogausch, A., Hofer, R., & Krebs, R. (2010). Rarely selected distractors in high stakes medical multiple-choice examinations and their recognition by item authors: A simulation and survey. BMC Medical Education, 10(1). https://doi.org/10.1186/1472-6920-10-85
Sajjad, M., Iltaf, S., & Khan, R. A. (2020). Nonfunctional distractor analysis: An indicator for quality of multiple choice questions. Pakistan Journal of Medical Sciences, 36(5), 982–986. https://doi.org/10.12669/pjms.36.5.2439
Setiawati, F. A., Amelia, R. N., Sumintono, B., & Purwanta, E. (2023). Study item parameters of classical and modern theory of differential aptitude test: Is it comparable? European Journal of Educational Research, 12(2), 1097–1107. https://doi.org/10.12973/eu-jer.12.2.1097
Shahmirzadi, N. (2023). Validation of a language center placement test: Differential item functioning. International Journal of Language Testing, 13(1), 1–17.
Shakir, M. A., Shafiq, F., & Khalid, M. N. (2022). Assessment of learning achievement of visually impaired children at primary level. Pakistan Journal of Educational Research and Evaluation (PJERE), 9(2).
Shakurnia, A., Ghafourian, M., Khodadadi, A., Ghadiri, A., Amari, A., & Shariffat, M. (2022). Evaluating functional and non-functional distractors and their relationship with difficulty and discrimination indices in four-option multiple-choice questions. Education in Medicine Journal, 14(4), 55–62. https://doi.org/10.21315/eimj2022.14.4.5
Shanmugam, S. K. S., & Rajoo, M. (2020). Examining the quality of english test items. Malaysian Journal of Learning and Instruction, 17(2), 63–101. file:///D:/kutipan/33218.pdf
Sheng, Y. (2019). CTT Package in R. Measurement, 17(4), 211–219. https://doi.org/10.1080/15366367.2019.1600839
Siegert, R. J., Krägeloh, C. U., & Medvedev, O. N. (2022). Classical test theory and the measurement of mindfulness. In Handbook of Assessment in Mindfulness Research (pp. 1–14). Springer International Publishing. https://doi.org/10.1007/978-3-030-77644-2_3-1
Subali, B., Yogyakarta, U. N., Surakarta, U. M., Aminah, N. S., & Maret, U. S. (2021). Modern test theory. 14(1), 647–660.
Tarrant, M., Ware, J., & Mohammed, A. M. (2009). An assessment of functioning and non-functioning distractors in multiple-choice questions: A descriptive analysis. BMC Medical Education, 9(1), 1–8. https://doi.org/10.1186/1472-6920-9-40
Travezaño-Cabrera, A., Vilca, L. W., Quiroz-Becerra, J., Huerta, S. L., Delgado-Vallejos, R., & Caycho-Rodríguez, T. (2022). Meaning of Life Questionnaire (MLQ) in peruvian undergraduate students: Study of its psychometric properties from the perspective of classical test theory (CTT). BMC Psychology, 10(1), 206.
Yuwono, M. R., Aribowo, E. K., Firmansah, F., & Indrayanto, B. (2020). Pelatihan AnBuso, zipgrade, dan google form sebagai alternatif penilaian pembelajaran di era digital (AnBuso, Zipgrade and Google Form training as alternative learning assessments in the digital era). Jurnal Pengabdian Masyarakat, 3(3), 49–60.
DOI: https://doi.org/10.21831/pep.v28i2.71542
Refbacks
- There are currently no refbacks.
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
Find Jurnal Penelitian dan Evaluasi Pendidikan on:
ISSN 2338-6061 (online) || ISSN 2685-7111 (print)
View Journal Penelitian dan Evaluasi Pendidikan Visitor Statistics