Performance differences by gender in English reading test

Ari Arifin Danuwijaya, Universitas Pendidikan Indonesia, Indonesia
Adiyo Roebianto, Universitas Mercu Buana, Indonesia


Test fairness becomes an aspect that needs to be considered when developing a test instrument. It is highly recommended that the instrument should not be biased for the test takers by ensuring that they do not behave differently among male and female test-takers. This study aims to examine the extent to which the items in an English proficiency test function differently across gender. Fifty reading items were examined and analyzed using a statistical method for detecting DIF. The items were individually tested for gender DIF using Rasch model analysis with the analysis tool of ConQuest. The results showed that six items were detected for DIF, three of which were basic comprehension items, and the other three were vocabulary questions. Some possible ways of dealing with DIF items were also discussed.


DIF; gender differences; test fairness; reading test; ConQuest

Full Text:



Adams, R., & Wu, M. (2010a). ConQuest [Computer software]. ACER.

Adams, R., & Wu, M. (2010b). Differential Item Functioning. ACER.

Bond, T. G., & Fox, C. M. (2015). Applying the Rasch Model: Fundamental measurement in the human sciences (3rd ed.). Routledge.

Boone, W. J., Staver, J. S., & Yale, M. S. (2014). Rasch Analysis in the human sciences. Springer.

Curtis, D. D., & Boman, P. (2007). X-ray your data with Rasch. International Education Journal, 8(2), 249–259.

Dodeen, H. (2003). The use of person-fit statistics to analyze placement tests. In Paper presented at the Annual Meeting of the American Educational Research Association (Chicago, IL, April 21-25, 2003).

Huff, K. L. (2000). Evaluating Differential Item Functioning across selected item formats on a large-scale certification examination. PSYCHOMETRICSANDSCORING/TECHNICALREPORTS/Pages/default.aspx

Kan, A., & Bulut, O. (2014). Examining the relationship between gender DIF and language complexity in mathematics assessments. International Journal of Testing, 14(3), 245–264.

Karabatsos, G. (2003). Comparing the aberrant response detection performance of thirty-six person-fit statistics. Applied Measurement in Education, 16(4), 277–298.

Kim, Y., & Jang, E. E. (2009). Differential functioning of reading subskills on the OSSLT for L1 and ELL students: A multidimensionality, model-based DBF/DIF approach. Language Learning, 59(4), 825–865.

Kunnan, A. J. (1990). DIF in native language and gender groups in an ESL placement test. TESOL Quarterly, 24(4), 741–746.

Kunnan, A. J. (2007). Test fairness, test bias, and DIF. Language Assessment Quarterly, 4(2), 109–112.

Le, L. (2006). Analysis of Differential Item Functioning. In The Annual Meetings of the American Educational Research Association in San Francisco, 7-11 April 2006. Australian Council for Educational Research.

Lee-Ellis, S. (2009). The development and validation of a Korean C-Test using Rasch Analysis. Language Testing, 26(2), 245–274.

Lin, J., & Wu, F. (2003). Differential performance by gender in foreign language testing. In Poster for the 2003 annual meeting of NCME in Chicago, IL.

Ong, Y. M., Williams, J., & Lamprianou, I. (2015). Exploring crossing differential item functioning by gender in mathematics assessment. International Journal of Testing, 15(4), 337–355.

Pae, T. (2012). Causes of gender DIF on an EFL language test: A multiple-data analysis over nine years. Language Testing, 29(4), 533–554.

Reise, S. P. (1990). A comparison of item- and person-fit methods of assessing model-data fit in IRT. Applied Pscyhological Measurement, 14(2), 127–137.

Takala, S., & Kaftandjieva, F. (2000). Test fairness: A DIF analysis of an L2 vocabulary test. Language Testing, 17(3), 323–340.

Wu, M., Tam, H. P., & Jen, T.-H. (2016). Educational measurement for applied researchers: Theory into practice. Springer.

Zenisky, A. L., Hambleton, R. K., & Robin, F. (2003). DIF detection and interpretation in large-scale science assessments: Informing item writing practices. Educational Assessment, 9(1-2), 61-78.

Zumbo, B. D. (2003). Does item-level DIF manifest itself in scale-level analyses? Implications for translating language tests. Language Testing, 20(2), 136–147.



  • There are currently no refbacks.

Creative Commons License
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

Find Jurnal Penelitian dan Evaluasi Pendidikan on:


ISSN 2338-6061 (online)    ||    ISSN 2685-7111 (print)

View Journal Penelitian dan Evaluasi Pendidikan Visitor Statistics