Siti Eshah Mokshein, Universiti Pendidikan Sultan Idris, Malaysia, Malaysia
Haliza Ishak, Universiti Pendidikan Sultan Idris, Malaysia, Malaysia
Hishamuddin Ahmad, Universiti Pendidikan Sultan Idris, Malaysia, Malaysia


This study aimed at determining the quality of the English Paper 1 (EP1) items of UPSR trial examination for six graders in terms of its reliability, validity and items characteristics. It also sought to determine the difficulty levels of 40 multiple-choice items consisting five constructs of vocabulary, language and social expression, grammar, cloze-comprehension and reading comprehension. A number of 525 primary schools students were randomly selected from 3876 students in Kuala Selangor, Malaysia. Using the Rasch measurement model, the validity evidences were shown through the results of Principle Component Analysis (PCA), fit statistics and item distractor analysis. The results from PCA analysis showed the absence of second dimension in the test, which met the assumption of modern testing theory. Fit statistics analyses have identified seven misfit items that are beyond the acceptable range (0.7 - 1.3 logit). Item distractor analysis has identified five problematic items whereby three of them are also misfit items. Summary statistics shows that the reliability indices of Cronbach’s Alpha were greater than 0.80 and separation indices were greater than 2. This study would benefit teachers in improving existing assessment practice by spreading out the importance of item analysis in schools, particularly in language testing.


item analysis; validity and reliability; Rasch measurement model

Full Text:



Anderson, J. C., Clapham, C., & Wall, D. (1995). Language Test Construction and Evaluation. Cambridge: Cambridge University Press.

Asim, A. E. (2013). A diagnostic study of pre-service teachers’ competency in multiple-choice item development.

Research in Education, 89(1), 13–23. doi: 10.7227/RIE.89.1.2.

Athanasou, J. A., & Lamprianou, I. (2002). A teacher’s guide to assessment. Sydney: Social Science Press.

Azrilah, A. A., Saidfudin, M. M., & Azami, Z. (2013). Asas model pengukuran rasch: Pembentukan skala dan struktur pengukuran. Bangi: UKM Holdings Sdn. Bhd.

Bachman, L. F. (2000). Modern language testing at the turn of the century: assuring that what we count counts. Language Testing, 17(1), 1–42. doi: 10.1177/026553220001700101.

Baghaei, P., & Amrahi, N. (2011). Validation of a multiple choice english vocabulary test with the rasch model. Journal of Language Teaching and Research, 2(5), 1052–1060. doi: 10.4304/jltr.2.5.1052-1060.

Beglar, D. (2010). A Rasch-based validation of the vocabulary size test. Language Testing, 27(1), 101-118. doi:10.1177/0265532209340194.

Bond, T. G., & Fox, C. M. (2012). Applying the Rasch Model: Fundamental Measurement in the human Sciences (Second Ed.). New York: Routledge.

Chen, J. (2011). Language assessment : Its development and future-an interview with Lyle F. Bachman. Language Assessment Quarterly, 8(3), 277-290. doi: 10.1080/15434303.2011.591464.

Choe, M. (2010). A Rasch analysis of a nationwide english placement test in Korea. English Teaching, 65(2), 3-23. 10.15858/engtea.65.2.201006.3.

Downing, S. M. (2009). Twelve steps for effective test development. In S. M. Downing & T. M. Haladyna (Eds.), Handbook of test development (pp. 3–24). New York: Routledge.

Fisher, W. P. (2007). Rating scale instrument quality criteria. Rasch Measurement Transaction, 21(1), 1095.

Hambleton, R. K., & Jones, R. W. (1993). An NCME instructional module on comparison of classical test theory and item response theory and their applications to test development. Educational Measurement: Issues, 12(3), 38–47. doi: 10.1111/j.1745-3992.1993.tb00543.x.

Hammouri, H., & Sabah, S. A. (2010). Analysis and assessment of the Jordan National Test for Controlling the Quality of Science Instruction (NTCQSI): A Rasch measurement perspective. Educational Research and Evaluation: An International Journal on Theory and Practice, 16(6), 451-470. doi: 10.1080/09243453.2010.550469.

Hughes, A. (2008). Testing for language teachers. Cambridge: Cambridge University Press.

Kementerian Pendidikan Malaysia. (2013). Malaysia Education Blueprint 2013 - 2025: Preschool to Post-Secondary Education.

Kirfee, H. B. (2012). Analisis item aneka pilihan peperiksaan percubaan SPM Pendidikan Islam (1223/1) berdasarkan Model Rasch. University Malaya.

Koizumi, R., Sakai, H., Ido, T., Ota, H., Hayama, M., Sato, M., & Nemoto, A. (2011). Development and validation of a diagnostic grammar test for Japanese learners of English. Language Assessment Quarterly, 8(1), 53–72. doi: 10.1080/15434303.2010.536868.

Lee-ellis, S. (2009). The development and validation of a Korean C-Test using Rasch analysis. Language Testing, 26(2), 245–274. doi: 10.1177/0265532208101007.

Lembaga Peperiksaan. (2013). Pentaksiran KBAT. Putrajaya: Kementerian Pendidikan Malaysia.

Linacre, J. M. (2012). Winsteps help for Rasch analysis.

Martone, A., & Sireci, S. G. (2009). Evaluating alignment between curriculum, assessment, and instruction. Review of Educational Research, 79(4), 1332–1361. doi: 10.3102/0034654309341375.

McNamara, T. F. (1996). Measuring second language performance. England: Addison Wesley Longman Limited.

Miller, M. D., Linn, R. L., & Grondlund, N. E. (2009). Measurement and assessment in teaching (10th Ed.). New Jersey: Pearson Education.

Pae, H. K. (2012). A psychometric measurement model for adult english language learners : Pearson Test of English Academic. Educational Research and Evaluation: An International Journal on Theory and Practice, 18(3), 211-229 doi: 10.1080/13803611.2011.650921.

Prapphal, K. (2008). Issues and trends in language testing and assessment in Thailand. Language Testing, 25(1), 127–143. doi:10.1177/0265532207083748.

Rahayah, S., Omar, B., & Sharif, S. (2010). Validity and reliability multiple intelligent item using rasch measurement model, 9(0), 729–733. doi: 10.1016/j.sbspro.2010.12.225.

Reich, G. A. (2013). Imperfect models, imperfect conclusions : An exploratory study of multiple-choice tests and historical knowledge. The Journal of Social Studies Research, 37(1), 3–16. doi: 10.1016/j.jssr.2012.12.004.

Rodriguez, M. C., Kettler, R. J., & Elliott, S. (2014). Distractor functioning in modified items for test accessibility. SAGE Open, 4(4), 1-10. doi: 10.1177/2158244014553586.

Stewart, J. (2014). Do multiple-choice options inflate estimates of vocabulary size on the VST ? Do multiple-choice options inflate estimates of vocabulary size on the VST? Language Assessment Quarterly, 11(3), 271-282. doi: 10.1080/15434303.2014.922977.

Stobart, G. (2001). The validity of national curriculum assessment. British Journal of Educational Studies, 49(1), 26–39. doi: 10.1111/1467-8527.t01-1-00161.

Tavakol, M., & Dennick, R. (2013). Psychometric evaluation of a knowledge based examination using Rasch analysis: an illustrative guide: AMEE guide no. 72. Medical Teacher, 35(1), e838–e848. doi: 10.3109/0142159X.2012.737488.

Wiliam, D. (2010). Standardized testing and school accountability. Educational Psychologist, 45(2), 107–122. doi: 10.1080/00461521003703060.

Wolf, M. K., Farnsworth, T., & Herman, J. (2008). Validity issues in assessing english language learners’ language proficiency validity issues in assessing english language learners’ language proficiency. Educational Assessment, 13(2-3), 80-107. doi: 10.1080/10627190802394222.

Yusup, R. B. (2012). Item evaluation of the reading test of the Malaysian University English Test (MUET). Masters by Coursework & Shorter thesis, Melbourne Graduate School of Education, The University of Melbourne. Retrieved from



  • There are currently no refbacks.


Social Media:



 Creative Commons License
Jurnal Cakrawala Pendidikan by Lembaga Pengembangan dan Penjaminan Mutu Pendidikan UNY is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
Based on a work at

View Our Stats

Flag Counter