Differential Item Functioning of the region-based national examination equipment

Adi Setiawan, PT Batamindo Green Farm, Indonesia
Gulzhaina Kuralbaevna Kassymova, Abai Kazakh National Pedagogical University, Kazakhstan
Vianney Mbazumutima, African Institute for Mathematical Sciences, Cameroon
Anggit Reviana Dewi Agustyani, Umeå Mathematics Education Research Centre (UMERC), Sweden

Abstract


This research aims to detect Differential Item Functioning (DIF) in the 2014/2015 National Examination Questions in mathematics of junior high schools and equivalent-level schools in the Yogyakarta region as a reference group and the South Kalimantan region as a focus group using the Likelihood Ratio Test (LRT) method, Area Measure Raju, and Lord. A sensitivity analysis was conducted to determine the most sensitive method. The data consisted of 5,465 National Examination papers of the students from the two regions who worked on type A questions. A sample of 1,000 exam papers for each region was established using the simple random sampling (SRS) technique, which was conducted to avoid the effect of sample size. The research results showed that by using the LRT method, the researchers found 36 items had significant DIF detection, 32 items were significant for Raju Area, and all items had significant DIF detection using Lord. Lord Method is the most sensitive method because it can detect most DIF items.


Keywords


comparison of DIF detection methods; differential items functioning; unidimensional IRT

Full Text:

PDF

References


Akour, M., Sabah, S., & Hammouri, H. (2015). Net and global differential item functioning in PISA polytomously scored science items. Journal of Psychoeducational Assessment, 33(2), 166–176. https://doi.org/10.1177/0734282914541337

Alfarizi. (2019). Meningkatkan mutu pendidikan di Indonesia melalui MESUPPEN “Maksimalkan pendekatan supervisi pendidikan.” Tugas Kuliah Administrasi dan Supervisi Pendidikan Jurusan Matematika Universitas Negeri Padang, 1–5. http://dx.doi.org/10.31227/osf.io/tmyz7

Azis, A. (2015). Conceptions and practices of assessment: A case of teachers representing improvement conception. TEFLIN Journal - A Publication on the Teaching and Learning of English, 26(2), 129-154. https://doi.org/10.15639/teflinjournal.v26i2/129-154

Başman, M. (2023). A comparison of the efficacies of differential item functioning detection methods. International Journal of Assessment Tools in Education, 10(1), 145–159. https://doi.org/10.21449/ijate.1135368

Berrío, Á. I., Herrera, A. N., & Gómez-Benito, J. (2019). Effect of sample size ratio and model misfit when using the difficulty parameter differences procedure to detect DIF. The Journal of Experimental Education, 87(3), 367–383. https://doi.org/10.1080/00220973.2018.1435502

Çelik, M., & Özkan, Y. Ö. (2020). Analysis of differential item functioning of PISA 2015 Mathematics subtest subject to gender and statistical regions. Eğitimde ve Psikolojide Ölçme ve Değerlendirme Dergisi, 11(3), 283–301. https://doi.org/10.21031/epod.715020

Center for Educational Assessment. (2020). Laporan hasil ujian nasional - Capaian nasional. Pusat Penilaian Pendidikan, Kementerian Pendidikan dan Kebudayaan. https://hasilun.pusmenjar.kemdikbud.go.id/#2019!smp!capaian_nasional!99&99&999!T& T&T&T&1&!1!&

Cho, S., Suh, Y., & Lee, W. (2016). An NCME instructional module on latent DIF analysis using mixture item response models. Educational Measurement: Issues and Practice, 35(1), 48–61. https://doi.org/10.1111/emip.12093

Delgado, A. R., Burin, D. I., & Prieto, G. (2018). Testing the generalized validity of the emotion knowledge test scores. PLOS ONE, 13(11), e0207335. https://doi.org/10.1371/journal.pone.0207335

Desjardins, C. D., & Bulut, O. (2017). Handbook of educational measurement and psychometrics using R. Chapman and Hall/CRC. https://doi.org/10.1201/b20498

Effendi, E. (2011). Detecting crossing differential item functioning (CDIF): Based on item response theory. Jurnal Evaluasi Pendidikan, 2(2), 147-158. https://dx.doi.org/10.21009/JEP.022.03

Effiom, A. P. (2021). Test fairness and assessment of differential item functioning of mathematics achievement test for senior secondary students in Cross River state, Nigeria using item response theory. Global Journal of Educational Research, 20(1), 55–62. https://doi.org/10.4314/gjedr.v20i1.6

French, B. F., Finch, W. H., & Immekus, J. C. (2019). Multilevel Generalized Mantel-Haenszel for differential item functioning detection. Frontiers in Education, 4, 47. https://doi.org/10.3389/feduc.2019.00047

Gaberson, K. B. (1997). Measurement reliability and validity. AORN Journal, 66(6), 1092–1094. https://doi.org/10.1016/S0001-2092(06)62551-9

Galli, S., Chiesi, F., & Primi, C. (2011). Measuring mathematical ability needed for “non mathematical” majors: The construction of a scale applying IRT and differential item functioning across educational contexts. Learning and Individual Differences, 21(4), 392–402. https://doi.org/10.1016/j.lindif.2011.04.005

Hadi, S., Basukiyatno, B., & Susongko, P. (2021). Differential item functioning national examination on device test mathematics high school in Central Java. Proceedings of the 1st International Conference on Social Science, Humanities, Education and Society Development, ICONS 2020, 30 November, Tegal, Indonesia. https://doi.org/10.4108/eai.30-11-2020.2303726

Hadi, S., Puspita, F., Ati, A. P., & Widiyarto, S. (2020). Penyuluhan dan pembelajaran karakter melalui pelaksanaan Idul Adha pada siswa SMA. Jurnal Pemberdayaan: Publikasi Hasil Pengabdian Kepada Masyarakat, 4(2), 205–210. https://doi.org/10.12928/jp.v4i2.1833

Hambleton, R. K., Swaminathan, H., & Rogers, H. J. (1991). Fundamentals of item response theory. In Fundamentals of item response theory. Sage Publications, Inc.

Hidajad, A. (2019). Pendidikan Indonesia: Ramai di dapur, sepi di panggung (Sebuah tinjauan perkembangan). GETER : Jurnal Seni Drama, Tari dan Musik, 2(2), 1–11. https://doi.org/10.26740/geter.v2n2.p1-11

Huang, X., Wilson, M., & Wang, L. (2016). Exploring plausible causes of differential item functioning in the PISA science assessment: Language, curriculum or culture. Educational Psychology, 36(2), 378–390. https://doi.org/10.1080/01443410.2014.946890

Ihsan, H. (2016). Validitas isi alat ukur penelitian konsep dan panduan penilaiannya. PEDAGOGIA Jurnal Ilmu Pendidikan, 13(2), 266-273. https://doi.org/10.17509/pedagogia.v13i2.3557

James, G., James, R. C., & Davis, P. J. (1959). Mathematics dictionary. Physics Today, 12(10), 50–52. https://doi.org/10.1063/1.3060526

Jusmirad, M., Angraeini, D., Faturrahman, M., Syukur, M., & Arifin, I. (2023). Implementasi literasi dan numerasi pada program MBKM dan dampaknya terhadap siswa SMP Datuk Ribandang. Jurnal Pendidikan Indonesia, 4(03), 303–310. https://doi.org/10.59141/japendi.v4i03.1687

Kane, M. T. (2013). Validating the interpretations and uses of test scores. Journal of Educational Measurement, 50(1), 1–73. https://doi.org/10.1111/jedm.12000

Langer, M. M. (2008). A reexamination of lord’s wald test for differential item functioning using item response theory and modern error estimation. Dissertation, The University of North Carolina. https://doi.org/10.17615/chn0-dz45

Leiner, J. E. M., Scherndl, T., & Ortner, T. M. (2018). How do men and women perceive a highstakes test situation? Frontiers in Psychology, 9. https://doi.org/10.3389/fpsyg.2018.02216

Ozdemir, B., & Alshamrani, A. H. (2020). Examining the fairness of language test across gender with IRT-based differential item and test functioning methods. International Journal of Learning, Teaching and Educational Research, 19(6), 27–45. https://doi.org/10.26803/ijlter.19.6.2

Patricia, D. C., & Araújo, L. (2012). Differential item functioning (DIF): What functions differently for immigrant students in PISA 2009 reading items ? JRC Publications Repository. European Union. https://doi.org/10.2788/60811

Raju, N. S. (1990). Determining the significance of estimated signed and unsigned areas between two item response functions. Applied Psychological Measurement, 14(2), 197–207. https://doi.org/10.1177/014662169001400208

Retnawati, H. (2013). Pendeteksian keberfungsian butir pembeda dengan indeks volume sederhana berdasarkan teori respons butir multidimensi. Jurnal Penelitian dan Evaluasi Pendidikan, 17(2), 275–286. https://doi.org/10.21831/pep.v17i2.1700

Scott, N. W., Fayers, P. M., Aaronson, N. K., Bottomley, A., de Graeff, A., Groenvold, M., Gundy, C., Koller, M., Petersen, M. A., & Sprangers, M. A. G. (2009). A simulation study provided sample size guidance for differential item functioning (DIF) studies using short scales. Journal of Clinical Epidemiology, 62(3), 288–295. https://doi.org/10.1016/j.jclinepi.2008.06.003

Siegrist, M., Connor, M., & Keller, C. (2012). Trust, confidence, procedural fairness, outcome fairness, moral conviction, and the acceptance of GM field experiments. Risk Analysis, 32(8), 1394–1403. https://doi.org/10.1111/j.1539-6924.2011.01739.x

Sinha, R., van den Heuvel, W. A., & Arokiasamy, P. (2013). Validity and reliability of MOS short form health survey (SF-36) for use in India. Indian Journal of Community Medicine, 38(1), 22-26. https://doi.org/10.4103/0970-0218.106623

Sitepu, V. V., & Rahmawati, F. (2022). Analisis pusat pertumbuhan dan sektor ekonomi dalam mengurangi ketimpangan pendapatan. AKUNTABEL: Jurnal Akuntansi dan Keuangan, 19(1), 1–12. https://download.garuda.kemdikbud.go.id/article.php?article=3275677&val=11261&title= Analisis%20pusat%20pertumbuhan%20dan%20sektor%20ekonomi%20dalam%20mengur angi%20ketimpangan%20pendapatan

Soysal, S., & Koğar, E. Y. (2021). An investigation of item position effects by means of IRT-based differential item functioning methods. International Journal of Assessment Tools in Education, 8(2), 239–256. https://doi.org/10.21449/ijate.779963

Sudaryono, S. (2017). Sensitivity of differential item functioning (DIF) detection method. Jurnal Evaluasi Pendidikan, 3(1), 82-94. https://doi.org/10.21009/JEP.031.07

Thissen, D., Steinberg, L., & Gerrard, M. (1986). Beyond group-mean differences: The concept of item bias. Psychological Bulletin, 99(1), 118–128. https://doi.org/10.1037/0033-2909.99.1.118

Thissen, D., Steinberg, L., & Wainer, H. (1988). Use of item response theory in the study of group differences in trace lines. In Test validity. (pp. 147–172). Lawrence Erlbaum Associates, Inc. https://doi.org/10.1037/14047-004

Turang, D. A. O. (2017). Pendekatan model ontologi untuk pencarian lembaga pendidikan (Studi kasus lembaga pendidikan provinsi Daerah Istimewa Yogyakarta). Jurnal Ilmiah Teknologi Infomasi Terapan, 3(3), 175-182. https://doi.org/10.33197/jitter.vol3.iss3.2017.134

Uğurlu, S., & Atar, B. (2020). Performances of MIMIC and logistic regression procedures in detecting DIF. Eğitimde ve Psikolojide Ölçme ve Değerlendirme Dergisi, 11(1), 1–12. https://doi.org/10.21031/epod.531509

Ukanda, F., Othuon, L., Agak, J., & Oleche, P. (2019). Effectiveness of Mantel-Haenszel and logistic regression statistics in detecting differential item functioning under different conditions of sample size, ability distribution and test length. American Journal of Educational Research, 7(11), 878–887. https://www.sciepub.com/EDUCATION/abstract/11217

Whynes, D. K., Sprigg, N., Selby, J., Berge, E., & Bath, P. M. (2013). Testing for differential item functioning within the EQ-5D. Medical Decision Making, 33(2), 252–260. https://doi.org/10.1177/0272989X12465016

Yamin, M., & Syahrir, S. (2020). Pembangunan pendidikan merdeka belajar (Telaah metode pembelajaran). Jurnal Ilmiah Mandala Education, 6(1), 126-136. https://doi.org/10.36312/jime.v6i1.1121

Yildirim, O. (2019). Detecting gender differences in PISA 2012 mathematics test with differential item functioning. International Education Studies, 12(8), 59-71. https://doi.org/10.5539/ies.v12n8p59

Zampetakis, L. A., Bakatsaki, M., Litos, C., Kafetsios, K. G., & Moustakis, V. (2017). Gender-based differential item functioning in the application of the theory of planned behavior for the study of entrepreneurial intentions. Frontiers in Psychology, 8, 451. https://doi.org/10.3389/fpsyg.2017.00451

Zukmadini, A. Y., Karyadi, B., & Rochman, S. (2021). Peningkatan kompetensi guru melalui workshop model integrasi terpadu literasi sains dan pendidikan karakter dalam pembelajaran IPA. Publikasi Pendidikan, 11(2), 107-116. https://doi.org/10.26858/publikan.v11i2.18378




DOI: https://doi.org/10.21831/reid.v10i1.73270

Refbacks

  • There are currently no refbacks.




Creative Commons License
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.




Find REID (Research and Evaluation in Education) on:

  

ISSN 2460-6995 (Online)

View REiD Visitor Statistics