A multidimensional item response theory approach in the item analysis of Arabic language tests in madrasah aliyah
DOI:
https://doi.org/10.21831/pep.v29i2.90877Keywords:
multidimensional; item response theory; analysis; arabic languageAbstract
This study evaluates the quality of Arabic test items in madrasah assessments using a quantitative approach based on Multidimensional Item Response Theory (MIRT). The sample comprised 321 twelfth-grade students from MAN 1 Surakarta, purposively selected because the institution implements systematic and independent assessments. Data were obtained from student responses to the final Arabic examination in the 2022/2023 academic year. Exploratory Factor Analysis (EFA) was first conducted to identify the dimensional structure of the test, using the criteria KMO > 0.60 and a significant Bartlett’s Test of Sphericity (p < 0.05). Factor extraction was determined by eigenvalues > 1 and supported by scree plot inspection. Model fit was subsequently examined using a MIRT 2-parameter logistic (2PL) model in R, with evaluation indicators RMSEA < 0.06, CFI > 0.90, and TLI > 0.90. Item parameters included discrimination (d) and difficulty (b), where discrimination was classified as: < 0.00 (unacceptable); 0.00–0.34 (very low); 0.35–0.64 (low); 0.65–1.34 (moderate); ≥ 1.35 (high). The findings reveal substantial variability in item performance. Most items demonstrated acceptable discrimination; however, 16 items had negative discrimination, indicating weaknesses in content representation and item construction. A few items (items 1, 3, 7, 10, and 22) showed high discrimination and are highly informative. Difficulty levels were dominated by easy items, which limited the test’s ability to distinguish between medium- and high-ability examinees. The study recommends revising misfitting items, adding items with moderate difficulty and d > 0.65, and enhancing validity through Confirmatory Factor Analysis and bias detection using DIF analysis.
References
Ackerman, T. A., & Ma, Y. (2024). Examining differential item functioning from a multidimensional IRT perspective. Psychometrika, 89(1), 4-41. https://doi.org/https://doi.org/10.1007/s11336-024-09965-6
Al-Qerem, W., Abdo, S., Jarab, A., Hammad, A., Eberhardt, J., Al-Asmari, F., . . . Zumot, R. (2025). Validation of the Arabic Version of the Long-Term Conditions Questionnaire (LTCQ): A Study of Factor and Rasch Analyses. Healthcare,
Alhamami, M. (2025). Intention over motivation: A holistic analysis of psychological constructs in Arabic as a foreign language learning. Acta Psychologica, 258, 105142. https://doi.org/https://doi.org/10.1016/j.actpsy.2025.105142
Asadizanjani, N., Reddy Kottur, H., & Dalir, H. (2025). Testing and Reliability in Advanced Packaging. In Introduction to Microelectronics Advanced Packaging Assurance (pp. 141-159). Springer. https://doi.org/https://doi.org/10.1007/978-3-031-86102-4_8
Ayanwale, M. A., Chere-Masopha, J., & Morena, M. C. (2022). The classical test or item response measurement theory. https://doi.org/https://doi.org/10.26803/ijlter.21.8.22
Belenguer, L. (2022). AI bias: exploring discriminatory algorithmic decision-making models and the application of possible machine-centric solutions adapted from the pharmaceutical industry. AI and Ethics, 2(4), 771-787. https://doi.org/https://doi.org/10.1007/s43681-022-00138-8
Diki, D., & Yuliastuti, E. (2018). Discrepancy of difficulty level based on item analysis and test developers’ judgment: Department of Biology at Universitas Terbuka, Indonesia Educational technology to improve quality and access on a global scale: Papers from the Educational Technology World Conference (ETWC 2016), Indonesia.
Embretson, S. E., & Reise, S. P. (2013). Item response theory for psychologists. Psychology Press.
Engida, M. A., Iyasu, A. S., & Fentie, Y. M. (2024). Impact of teaching quality on student achievement: student evidence. Frontiers in Education,
Ernawati, E., Habibah, R. Y., Syarifah, N., Firmansyah, F., & Attamimi, H. a. R. (2024). Item analysis test of science, Indonesian language, and mathematics using the rasch model in elementary schools. Jurnal Penelitian dan Evaluasi Pendidikan, 28(2), 195-209. https://doi.org/https://doi.org/10.21831/pep.v28i2.75448
Escudero, E. B., Reyna, N. L., & Morales, M. R. (2000). The level of difficulty and discrimination power of the Basic Knowledge and Skills Examination (EXHCOBA). Revista electrónica de investigación educativa, 2(1), 2. https://doi.org/http://redie.uabc.mx/vol2no1/contents-backhoff.html
Freed, R., McKinnon, D., Fitzgerald, M., & Norris, C. M. (2022). Development and validation of an astronomy self-efficacy instrument for understanding and doing. Physical Review Physics Education Research, 18(1), 010117. https://doi.org/https://doi.org/10.1103/PhysRevPhysEducRes.18.010117
Hambleton, R. K., Swaminathan, H., & Rogers, H. J. (1991). Fundamentals of item response theory (Vol. 2). Sage.
Istiqlal, M., Putro, N. H. P. S., & Istiyono, E. (2025). Evaluating English Language Test Items Developed by Teachers: An Item Response Theory Approach. Voices of English Language Education Society, 9(1), 218-230. https://doi.org/https://doi.org/10.29408/veles.v9i1.27644
Jewsbury, P. A., & van Rijn, P. W. (2020). IRT and MIRT models for item parameter estimation with multidimensional multistage tests. Journal of Educational and Behavioral Statistics, 45(4), 383-402. https://doi.org/https://doi.org/10.3102/1076998619881790
Jordan, P., & Spiess, M. (2019). Rethinking the interpretation of item discrimination and factor loadings. Educational and psychological measurement, 79(6), 1103-1132. https://doi.org/10.1177/0013164419843164
Jundi, M. (2023). Classical Test Theory in Analyzing Arabic Test Questions: A Descriptive Study on Item Analysis Research in Indonesia/نظرية الاختبار الكلاسيكية في تحليل الأسئلة العربية: الدراسة الوصفية على بحوث تحليل بنود الأسلة في إندونيسيا. ATHLA: Journal of Arabic Teaching, Linguistic and Literature, 4(2), 85-105. https://doi.org/https://doi.org/10.22515/athla.v4i2.7747
Kadir, S., Sarif, S., & Fuadi, A. H. N. (2024). Item Analysis of Arabic Thematic Questions to Determine Thinking Level Ability. ELOQUENCE: Journal of Foreign Language, 3(1), 28-41. https://doi.org/https://doi.org/10.58194/eloquence.v3i1.1498
Karnia, R. (2024). Importance of reliability and validity in research. Psychology and Behavioral Sciences, 13(6), 137-141. https://doi.org/10.13140/RG.2.2.30985.45921
Kasali, J., & Adeyemi, A. A. (2022). Estimation of item parameter indices of NECO Mathematics multiple choice test items among Nigerian students. Journal of Integrated Elementary Education, 2(1), 43-54.
Lee, W. C., Kim, S. Y., Choi, J., & Kang, Y. (2020). IRT approaches to modeling scores on mixed‐format tests. Journal of Educational Measurement, 57(2), 230-254. https://doi.org/ https://doi.org/10.1111/jedm.12248
Madi, D., & Clinton, M. (2015). Rasch analysis of the Arabic language version of the functional disability inventory. Journal of Pediatric Oncology Nursing, 32(4), 230-239. https://doi.org/https://doi.org/10.1177/1043454214554010
Mahmudi, I., Nurwardah, A., Rochma, S. N., & Nurcholis, A. (2023). Item Analysis Of Arabic Language Examination. Ijaz Arabi Journal of Arabic Learning, 6(3). https://doi.org/https://doi.org/10.18860/ijazarabi.v6i3.19821
Mardapi, D. (2020). Assessing Students' Higher Order Thinking Skills Using Multidimensional Item Response Theory. Problems of Education in the 21st Century, 78(2), 196-214. https://doi.org/https://doi.org/10.33225/pec/20.78.196
Ningsih, N. T. R., Rosidin, U., Viyanti, V., Distrik, I. W., & Abdurrahman, A. (2025). Development of an Assessment Instrument for Students Discipline and Responsibility in Physics Practicum-Based Cooperative Learning. Sebatik, 29(1), 67-73. https://doi.org/10.46984/sebatik.v29i1.2595
Ntumi, S. (2025). Advanced Multidimensional Item Response Theory Modeling for High-Stakes, Cross-Disciplinary Competency Assessments in Sub-Saharan Africa: A Psychometric Approach to Equity, Adaptivity, and Policy Integration. https://doi.org/https://doi.org/10.21203/rs.3.rs-6418690/v1
Nury, A. H. A., Hikmah, H., & Masrun, M. (2025). Assessment Instruments for Tarkib and Mufrodat in the Ministry of Religion's Arabic Language Textbook. Al-Lahjah: Jurnal Pendidikan, Bahasa Arab, dan Kajian Linguistik Arab, 8(2), 1021-1031. https://doi.org/https://doi.org/10.32764/lahjah.v8i2.5879
Oladele, J. I., & Ndlovu, M. (2021). A review of standardised assessment development procedure and algorithms for computer adaptive testing: Applications and relevance for fourth industrial revolution. International Journal of Learning, Teaching and Educational Research, 20(5), 1-17.
Perie, M. (2020). Comparability across different assessment systems. Comparability of large-scale educational assessments: Issues and recommendations, 123-148.
Puia, A.-M., Mihalcea, A., & Rotărescu, V. Ș. (2025). Well-being factors. An item-level analysis of the positive cognitive triad role, in the relationship between resilience and well-being. Acta Psychologica, 253, 104692. https://doi.org/https://doi.org/10.1016/j.actpsy.2025.104692
Ramadhan, M. R., & Subando, J. (2025). Analisis kualitas butir soal fiqih dan kemampuan siswa di Madrasah Aliyah Negeri 1 Surakarta. Al Ulum Jurnal Pendidikan Islam, 126-138.
Sadeghi, P., Pourabbas, A., Dehghani, G., & Katebi, K. (2025). Quantitative and qualitative item analysis of exams of basic medical sciences departments of Tabriz University of Medical Sciences in 2023. BMC Medical Education, 25(1), 937. https://doi.org/10.1186/s12909-025-07539-3
Saepudin, S., Pabbajah, M. T. H., & Pabbajah, M. (2024). Unleashing the power of reading: Effective strategies for non-native Arabic language learners. Alsinatuna, 9(2), 109-130. https://doi.org/https://doi.org/10.28918/alsinatuna.v9i2.7826
Scheerens, J., Glas, C. A., & Thomas, S. (2003). Educational evaluation, assessment, and monitoring: A systemic approach (Vol. 13). Taylor & Francis.
Shafie, S., Majid, F. A., Hoon, T. S., & Damio, S. M. (2021). Evaluating Construct Validity and Reliability of Intention to Transfer Training Conduct Instrument Using Rasch Model Analysis. Pertanika Journal of Social Sciences & Humanities, 29(2). https://doi.org/https://doi.org/10.47836/pjssh.29.2.17
Terry, D., & Nguyen, H. (2024). Assessing Measuring Instruments. Nursing and Midwifery Research-E-Book: Methods and Appraisal for Evidence Based Practice, 151.
Wilson, M. (2023). Constructing measures: An item response modeling approach. Routledge.
Zakkiyah, M. Y., Fidyahwati, N. M., Ma'suq, A. T., & Anggraini, N. (2024). Assessment Design and Analysis of Arabic Reading Skills Instructional Materials. IJIE International Journal of Islamic Education, 3(1), 31-46. https://doi.org/https://doi.org/10.35719/ijie.v3i1.2000
Zeinoun, P., Iliescu, D., & El Hakim, R. (2022). Psychological tests in Arabic: A review of methodological practices and recommendations for future use. Neuropsychology Review, 32(1), 1-19. https://doi.org/https://doi.org/10.1007/s11065-021-09476-6
Zondo, N. P., Zewotir, T., & North, D. E. (2021). The level of difficulty and discrimination power of the items of the National Senior Certificate Mathematics Examination. South African Journal of Education, 41(4), 1-13. https://doi.org/10.15700/saje.v41n4a1935
Published
How to Cite
Issue
Section
Citation Check
License
Copyright (c) 2025 Jurnal Penelitian dan Evaluasi Pendidikan

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
The authors submitting a manuscript to this journal agree that, if accepted for publication, copyright publishing of the submission shall be assigned to Jurnal Penelitian dan Evaluasi Pendidikan. However, even though the journal asks for a copyright transfer, the authors retain (or are granted back) significant scholarly rights.
The copyright transfer agreement form can be downloaded here: [JPEP Copyright Transfer Agreement Form]
The copyright form should be signed originally and sent to the Editorial Office through email to jurnalhepi@uny.ac.id

Jurnal Penelitian dan Evaluasi Pendidikan by http://journal.uny.ac.id/index.php/jpep is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.








