Item analysis of reading comprehension questions for English proficiency test using Rasch model

Henda Harmantia Dewi, Universitas Negeri Yogyakarta, Indonesia
Siti Maftuhah Damio, Universiti Teknologi MARA, Malaysia
Sukarno Sukarno, Universitas Negeri Yogyakarta, Indonesia


The need to take English as a foreign language proficiency test (known as TOEFL [Test of English Language Proficiency]) has been gaining popularity in Indonesia. The increasing demands for such a test and its expensive cost have reinforced many institutions to develop TOEFL instruments and administer the test internally. However, constructing a test instrument is a complex process that makes conducting item analysis become more challenging. Meanwhile, item analysis is crucial to assess the items’ quality. Therefore, this study reported the results of statistically analyzing 20 questions of TOEFL reading comprehension that were analyzed in terms of the test reliability, the item and person fit, and the items’ difficulty level. Thirty-eight members of the English Department Students’ Association of a state university in West Java participated in this study by taking the reading test. The data were analyzed using the Rasch model by utilizing the Quest program. The results showed that four items (36.8%) did not fulfill the ideal criteria of a valid test because they were too easy and too difficult to be given to the target test takers; thus, they needed to be discarded. Meanwhile, 16 items (63.2%) are of good quality and can be used immediately in the proficiency test, especially to measure reading comprehension skills, because they have fulfilled the standard requirements for a valid test. The findings have provided insight into the importance of item analysis in validating test instruments to improve the test quality for future administrations.


reading comprehension; English proficiency test; item analysis; Rasch model

Full Text:



Ardiyanti, D. (2016). Aplikasi model Rasch pada pengembangan skala efikasi diri dalam pengambilan keputusan karier siswa. Jurnal Psikologi, 43(3), 248–263.

Azizah, N., Suseno, M., & Hayat, B. (2022). Item analysis of the rasch model items in the final semester exam indonesian language lesson. World Journal of English Language, 12(1), 15–26.

Bo, W. V., Fu, M., & Lim, W. Y. (2022). Revisiting English language proficiency and its impact on the academic performance of domestic university students in Singapore. Language Testing, 40(1).

Brown, H. D., & Abeywickrama, P. (2018). Language assessment: Principles and classroom practices (3rd Ed.). Pearson/Longman.

Brown, J. D. (2012). Classical test theory. In G. Fulcher & F. Davidson (Eds.), The Routledge handbook of language testing. Routledge.

Choi, I. C. (2008). The impact of EFL testing on EFL education in Korea. Language Testing, 25(1), 39–62.

Cohen, L., Manion, L., & Morrison, K. (2018). Research methods in education (8th Ed.). Routledge.

Danuwijaya, A. A. (2018). Item analysis of reading comprehension test for post-graduate students. English Review: Journal of English Education, 7(1), 29-40.

Delgado-Rico, Carretero-Dios, H., & Ruch, W. (2012). Content validity evidences in test development: An applied perspective. International Journal of Clinical and Health Psychology España, 12(3), 449–460.

Downing, S. M. (2010). Twelve steps for effective test development. In S. M. Downing & S. M. Downing (Eds.), Handbook of test development. Routledge.

ETS TOEFL. (2022). TOEFL iBT® reading section. ETS.

ETS TOEFL ITP. (2022). TOEFL ITP® assessment series. ETS.

Faradillah, A., & Adlina, S. (2021). Validity of critical thinking skills instrument on prospective Mathematics teachers. Jurnal Penelitian Dan Evaluasi Pendidikan, 25(2), 126-137.

Faradillah, A., & Febriani, L. (2021). Mathematical trauma students’ junior high school based on grade and gender. Infinity Journal, 10(1), 53-67.

Finch, W. H., & French, B. F. (2015). Latent variable modeling with R (W. H. Finch, Ed.; 1st Ed.). Taylor & Francis.

Fulcher, G., & Davidson, F. (2007). Language testing and assessment: An advanced resource book (1st Ed.). Routledge.

Golubovich, J., Tolentino, F., & Papageorgiou, S. (2018). Examining the applications and opinions of the TOEFL ITP® assessment series test scores in three countries. ETS Research Report Series, 2018(1), 1-30.

Habibi, H., Jumadi, J., & Mundilarto, M. (2019). The Rasch-rating scale model to identify learning difficulties of physics students based on self-regulation skills. International Journal of Evaluation and Research in Education, 8(4), 659–665.

Hagquist, C., & Andrich, D. (2017). Recent advances in analysis of differential item functioning in health research using the Rasch model. Health and Quality of Life Outcomes, 15(1), 181.

Hamon, A., & Mesbah, M. (2002). Questionnaire reliability under the Rasch model. In Statistical methods for quality of life studies (pp. 155–168). Springer.

Hayat, B., Dwirifqi, M., Putra, K., & Suryadi, B. (2020). Comparing item parameter estimates and fit statistics of the Rasch model from three different traditions. Jurnal Penelitian Dan Evaluasi Pendidikan, 24(1), 39–50.

Hingorjo, M. R., & Jaleel, F. (2012). Analysis of one-best MCQs: The difficulty index, discrimination index and distractor efficiency. JPMA-Journal of the Pakistan Medical Association, 62(2), 142–147.

Isnani, I., Utami, W. B., Susongko, P., & Lestiani, H. T. (2019). Estimation of college students’ ability on real analysis course using Rasch model. REID (Research and Evaluation in Education), 5(2), 95–102.

Izard, J. (2005). Trial testing and item analysis in test construction. In K. Ross (Ed.), Quantitative research methods in educational planning. UNESCO International Institute for Educational Planning.

Jannah, R., Hidayat, D. N., Husna, N., & Khasbani, I. (2021). An item analysis on multiple-choice questions: A case of a junior high school English try-out test in Indonesia. Leksika: Jurnal Bahasa, Sastra Dan Pengajarannya, 15(1), 9-17.

Karjo, C. H., & Ronaldo, D. (2019). The validity of TOEFL as entry and exit college requirements: Students’ perception. In Proceedings of the Eleventh Conference on Applied Linguistics (CONAPLIN 2018), 326–330.

Kimberlin, C. L., & Winterstein, A. G. (2008). Validity and reliability of measurement instruments used in research. In American Journal of Health-System Pharmacy, 65(23), 2276–2284).

Kunandar, K. (2013). Penilaian autentik: Penilaian hasil belajar peserta didik Kurikulum 2013. Raja Grafindo Persada.

Leung, C. (2022). Language proficiency: from description to prescription and back? Educational Linguistics, 1(1), 56–81.

Lia, R. M., Rusilowati, A., & Isnaeni, W. (2020). NGSS-oriented chemistry test instruments: Validity and reliability analysis with the Rasch model. REID (Research and Evaluation in Education), 6(1), 41-50.

Maharani, A. V., & Putro, H. N. P. S. (2020). Item analysis of English final semester test. Indonesian Journal of EFL and Linguistics, 5(2), 491–504.

Moses, T. (2017). A review of developments and applications in item analysis. In R. Bennett & M. von Davier (Eds.), Methodology of educational measurement and assessment: The methodological, psychological and policy contribution of ETS. Springer Open.

Mouvet, K., & Taverniers, M. (2022). What is language anyway? A view on teaching English proficiency in higher education. International Journal of TESOL Studies, 4(2), 8–23.

Muchlisin, M., Mardapi, D., & Setiawati, F. A. (2019). An analysis of Javanese language test characteristic using the Rasch model in R program. REID (Research and Evaluation in Education), 5(1), 61–74.

Mustafa, F. (2015). Using corpora to design a reliable test instrument for English proficiency assessment. In The 62nd TEFLIN International Conference 2015, 344–352.

Mustafa, F., & Apriadi, H. (2014). DIY: Designing a reading test as reliable as a paper-based TOEFL design by ETS. In Proceedings of the 1st English Education International Conference (EEIC) in Conjunction with the 2nd Reciprocal Graduate Research Symposium (RGRS) of the Consortium of Asia-Pacific Education Universities (CAPEU), 402–407.

Ndayizeye, O. (2017). Discrepancies in assessing undergraduates’ pragmatics learning. REID (Research and Evaluation in Education), 3(2), 133-143.

Ofianto, O. (2018). Analysis of instrument test of historical thinking skills in senior high school history learning with Quest programs. Indonesian Journal of History Education, 6(2), 184–192.

Phillips, D. (2001). Longman introductory course for the TOEFL test. Longman.

Pratama, D. (2020). Analisis kualitas tes buatan guru melalui pendekatan Item Response Theory (IRT) model Rasch. Tarbawy : Jurnal Pendidikan Islam, 7(1), 61–70.

Rahim, A., & Haryanto, H. (2021). Implementation of Item Response Theory (IRT) Rasch model in quality analysis of final exam tests in Mathematics. Journal of Research and Educational Research Evaluation (JERE), 10(2), 57–65.

Renandya, W. A., Hamied, F. A., & Nurkamto, J. (2018). English language proficiency in Indonesia: Issues and prospects. Journal of Asia TEFL, 15(3), 618–629.

Rizbudiani, A. D., Jaedun, A., Rahim, A., & Nurrahman, A. (2021). Rasch model item response theory (IRT) to analyze the quality of mathematics final semester exam test on system of linear equations in two variables(SLETV). Jurnal Pendidikan Matematika, 12(2), 399–412.

Sacko, M., & Haidara, Y. (2018). Developing autonomous listening learning materials for university students TOEFL preparation. LingTera, 5(2), 170–178.

Saswati, R. (2021). Item analysis of reading comprehension test: A study of test scores interpretation. Scope : Journal of English Language Teaching, 6(1), 42-49.

Setyawarno, D. (2017). Panduan penggunaan program Quest untuk analisis butir soal hasil belajar bahasa model konvergen dan divergen. Universitas Negeri Yogyakarta.

Sugianto, A. (2020). Item analysis of English summative test: EFL teacher-made test. Indonesian EFL Research and Practices, 1(1), 35–54.

Suryani, N. Y., & Khadijah, S. (2021). The effectiveness of virtual classroom in TOEFL preparation. Acitya: Journal of Teaching & Education, 3(2), 198-209.

Thu, A. S. (2019). Autonomous learning materials of structure and written expression for TOEFL preparation. LingTera, 6(1), 62–72.

Thurmond, V. A. (2001). The point of triangulation. Journal of Nursing Scholarship, 33(3), 253–258.

Wahyuni, A., & Kartowagiran, B. (2018). Developing assessment instrument of qirāatul kutub at Islamic boarding school. Jurnal Penelitian Dan Evaluasi Pendidikan, 22(2), 208–218.

Wright, B. D., & Mok, M. M. C. (2004). An overview of the family of Rasch measurement models. In E. V. Smith Jr. & R. M. Smith (Eds.), Introduction to Rasch measurement theory, models and applications (pp. 1–24). JAM Press.

Yumelking, M. (2019). Test items analysis constructed by EFL teachers of private senior high school in Kupang, Indonesia. International Journal of English Literature and Social Sciences, 4(6), 1746–1752.



  • There are currently no refbacks.

Creative Commons License
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

Find REID (Research and Evaluation in Education) on:


ISSN 2460-6995 (Online)

View REiD Visitor Statistics