Differential item functioning analysis of Arabic language exams across gender, study specialization, and geographic region in senior high schools

Authors

  • Anugrah Arya Bakti Universitas Negeri Yogyakarta, Indonesia
  • Marzuki Marzuki Universitas Negeri Yogyakarta, Indonesia
  • Zulfa Safina Ibrahim Universitas Negeri Yogyakarta, Indonesia
  • Rugaya Tuanaya Universitas Negeri Yogyakarta, Indonesia
  • Nur Yusra binti Yacob Universiti Teknologi Mara Shah Alam, Malaysia

DOI:

https://doi.org/10.21831/reid.v11i1.85961

Keywords:

differential item functioning (DIF), Arabic Language Assessment, Item Response Theory (IRT), Fairness in Testing, Senior High School Education in Indonesia

Abstract

This study aims to examine the fairness of Arabic language assessment instruments used in Muhammadiyah senior high schools by detecting the presence of Differential Item Functioning (DIF) in the Final Semester Summative Test (UAS) for 12th-grade students in the Special Region of Yogyakarta during the 2023/2024 academic year. Using a descriptive quantitative design, the research analyzed student response data from 1,157 participants across 25 schools. Data collection was conducted through documentation of test blueprints, item sheets, answer keys, and student responses. Analysis was performed using the Lord and Generalized Lord methods within the framework of Item Response Theory (IRT), focusing on three demographic variables: gender, study specialization (science vs. social studies), and school region (Yogyakarta City, Sleman, Bantul, and Kulon Progo). The Rasch model was identified as the most optimal model due to its superior fit and fulfillment of key psychometric assumptions, including unidimensionality and parameter invariance. The findings indicate that several items exhibit significant DIF across all examined variables. Eleven items showed gender-based DIF, with a higher number favoring male students. Twenty-three items demonstrated DIF by study specialization, and thirty-seven items displayed DIF based on school region, with students from Yogyakarta City benefiting the most. These results suggest that the test is not fully equitable and highlight the need for item revision to ensure fairness. The study contributes theoretically to the field of educational measurement and practically to the development of fairer evaluation practices in Islamic and language education settings.

References

Acar, T. (2012). Determination of a differential item functioning procedure using the hierarchical generalized linear model. Sage Open, 2(1), 1-8. https://doi.org/10.1177/2158244012436760

Alejandro, J. (2024). The role of language in thought formation and personality. International Journal of Multidisciplinary Sciences, 2(4), 356–367. https://doi.org/10.37329/ijms.v2i4.3759

Arslan, D., Tamul, Ö. F., Şahin, M. D., & Sak, U. (2023). Effects of gender norms on intelligence tests: Evidence from ASIS. Journal of Pedagogical Research, 7(5), 374–384. https://doi.org/10.33902/JPR.202323599

Azzizah, Y. (2015). Socio-economic factors on Indonesia education disparity. International Education Studies, 8(12), 218-230. https://doi.org/10.5539/ies.v8n12p218

Bakar, H. I. A. (2022). Implementation of Islamic values in ISMUBA curriculum to form a Rabbani generation at Muhammadiyah Sidareja High School. Journal of Islamic Education and Innovation, 3(2), 78–85. https://doi.org/10.26555/jiei.v3i2.6616

Camilli, G., & Shepard, L. A. (1994). Methods for identifying biased test items. SAGE Publications.

Çelik, M., & Yeşim, Ö. Ö. (2020). Analysis of differential item functioning of PISA 2015 mathematics subtest subject to gender and statistical regions. Eğitimde ve Psikolojide Ölçme ve Değerlendirme Dergisi, 11(3), 283–301. https://doi.org/10.21031/epod.715020

Danuwijaya, A. A., & Roebianto, A. (2020). Performance differences by gender in English reading test. Jurnal Penelitian dan Evaluasi Pendidikan, 24(2) 190-197. https://doi.org/10.21831/pep.v24i2.34344

Downey, R. G., & Stockdale, M. S. (1987). Computer programs to compute lord’s item bias statistic for a three-parameter ICC. Educational and Psychological Measurement, 47(3), 637–641. https://doi.org/10.1177/001316448704700313

Effiom, A. P. (2021). Test fairness and assessment of differential item functioning of mathematics achievement test for senior secondary students in Cross River state, Nigeria using item response theory. Global Journal of Educational Research, 20(1), 55–62. https://doi.org/10.4314/gjedr.v20i1.6

Fatimah, S., Rusilowati, A., Cahyono, E., & Rokhmaniyah. (2024). STEM learning in higher education: A comparative study of science curriculum in Singapore and Indonesia. International Journal of Scientific Multidisciplinary Research, 2(8), 1003–1030. https://doi.org/10.55927/ijsmr.v2i8.11048

Hope, D., Adamson, K., McManus, I. C., Chis, L., & Elder, A. (2018). Using differential item functioning to evaluate potential bias in a high stakes postgraduate knowledge based assessment. BMC Medical Education, 18(1), 64. https://doi.org/10.1186/s12909-018-1143-0

Jones, R. N. (2019). Differential item functioning and its relevance to epidemiology. Current Epidemiology Reports, 6(2), 174–183. https://doi.org/10.1007/s40471-019-00194-5

Khasawneh, M. A. S., & Khasawneh, Y. J. A. (2023). Achieving assessment equity and fairness: Identifying and eliminating bias in assessment tools and practices. Preprints, 2023060730. https://doi.org/10.20944/preprints202306.0730.v1

Liu, X., & Rogers, H. J. (2022). Treatments of differential item functioning: A comparison of four methods. Educational and Psychological Measurement, 82(2), 225–253. https://doi.org/10.1177/00131644211012050

Messick, S. (1995). Validity of psychological assessment: Validation of inferences from persons’ responses and performances as scientific inquiry into score meaning. American Psychologist, 50(9), 741–749. https://doi.org/10.1037/0003-066X.50.9.741

Mi’rotin, S., & Cholil, M. (2020). Analisis bias gender pada soal ujian Bahasa Arab di madrasah tsanawiyah. An Nabighoh: Jurnal Pendidikan dan Pembelajaran Bahasa Arab, 22(02), 191-210. https://doi.org/10.32332/an-nabighoh.v22i02.2232

Muttaqin, I., Bakheit, B. M., & Hasanah, M. (2024). Arabic language environment for Islamic boarding school student language acquisition: Capturing language input, interaction, and output. Al-Hayat: Journal of Islamic Education, 8(3), 891–907. https://doi.org/10.35723/ajie.v8i3.624

Nasution, F., & Tambunan, E. E. (2022). Language and communication. International Journal of Community Service (IJCS), 1(1), 01–10. https://doi.org/10.55299/ijcs.v1i1.86

Paek, I. (2018). Understanding differential item functioning and item bias in psychological instruments. Psychology and Psychotherapy: Research Study, 1(3). https://doi.org/10.31031/PPRS.2018.01.000514

Sari, R. R., & Hikmah, K. (2024). Implementation of Arabic language learning activities at the Muhammadiyah 2 Sidoarjo High School Boarding School. Al Mi’yar: Jurnal Ilmiah Pembelajaran Bahasa Arab dan Kebahasaaraban, 7(2), 1-9. https://doi.org/10.21070/ups.5350

Setiawan, A., Kassymova, G. K., Mbazumutima, V., & Agustyani, A. R. D. (2024). Differential item functioning of the region-based national examination equipment. REID (Research and Evaluation in Education), 10(1), 99–113. https://doi.org/10.21831/reid.v10i1.73270

Sopian, A., Abdurahman, M., Ali Tantowi, Y., Nur Aeni, A., & Maulani, H. (2025). Arabic language learning in a multicultural context at pesantren. Jurnal Pendidikan Islam, 11(1), 77–89. https://doi.org/10.15575/jpi.v11i1.44104

Sumin, S., Sukmawati, F., & Nurdin, N. (2022). Gender differential item functioning on the Kentucky Inventory of Mindfulness Skills instrument using logistic regression. REID (Research and Evaluation in Education), 8(1), 55–66. https://doi.org/10.21831/reid.v8i1.50809

Tierney, R. D. (2022). Fairness in educational testing and assessment. Routledge. https://doi.org/10.4324/9781138609877-REE35-1

Wahyuni, A. (2022). Detection of gender biased using DIF (Differential Item Functioning) analysis on item test of school examination Yogyakarta. Jurnal Evaluasi Pendidikan, 13(1), 46–49. https://doi.org/10.21009/jep.v13i1.26554

Waizah, N., & Herwani, H. (2021). Penilaian pengetahuan tertulis dalam kurikulum 2013. Tafkir: Interdisciplinary Journal of Islamic Education, 2(2), 207–228. https://doi.org/10.31538/tijie.v2i2.54

Wallin, G., Chen, Y., & Moustaki, I. (2024). DIF analysis with unknown groups and anchor items. Psychometrika, 89(1), 267–295. https://doi.org/10.1007/s11336-024-09948-7

Downloads

Published

2025-07-15

How to Cite

Bakti, A. A., Marzuki, M., Ibrahim, Z. S., Tuanaya, R., & binti Yacob, N. Y. (2025). Differential item functioning analysis of Arabic language exams across gender, study specialization, and geographic region in senior high schools. REID (Research and Evaluation in Education), 11(1), 59–74. https://doi.org/10.21831/reid.v11i1.85961

Issue

Section

Articles

Citation Check

Similar Articles

<< < 9 10 11 12 13 14 15 16 17 > >> 

You may also start an advanced similarity search for this article.