A comparison of the stability of ability parameter estimation based on the maximum likelihood and Bayesian estimation: A case study of dichotomous scoring test results
DOI:
https://doi.org/10.21831/reid.v11i1.89463Keywords:
ability estimation, Bayes method, maximum likelihood method, item response theory, dichotomous scoring testAbstract
This research is related to Item Response Theory (IRT), which is essential to determine the best method for estimating the ability of participants on the test measuring English listening ability. This study aims to (1) determine the characteristics of the test device measuring English listening ability, (2) determine the effect of the length of the test on the stability of the ability estimation using the maximum likelihood (ML) method, (3) determine the effect of test length on the stability of the ability estimation using the Bayes method, and (4) compare the stability of the ability estimate between ML and Bayes. This research is an exploratory descriptive study using a simulation approach. The best model is selected to generate data. The result of the generation is the actual ability (θ) and the participant's response, which is estimated with the maximum likelihood and Bayes, which produces the estimated ability with 10 replications, and is compared with calculating the MSE (mean square error). The method with a smaller MSE is stable and has a better estimation method. The results show that (1) the 2PL model is the best, (2) the length of the test affects the stability of the ability estimation in the ML method and the most stable case when the test contains 46 items, (3) the length of the test affects the stability of the ability estimate in the Bayes method and it is most stable when the test contains 46 items, and (4) the Bayes method is better and more accurate for estimating ability.
References
Alboukadel Kassambara, F. M. (2016). Package ‘factoextra.’
Anisa. (2013). Perbandingan Penskoran Dikotomi dan Politomi Dalam Teori Respon Butir Untuk Pengembangan Bank Soal Matakuliah Matematika Dasar. Jurnal Matematika, Statistika Dan Komputasi, 9(2), 95–113. Retrieved from https://journal.unhas.ac.id/index.php/jmsk/article/view/3402
Ayala, R. J. DE. (2010). The Theory and Practice Of Item Response Theory. New York, NY: Guilford Press
Aybek, E. C. (2021). catIRT tools: Application For Item Response Theory Calibration And Computerized Adaptive Testing Simulation. Journal of Applied Testing Technology, 22(1), 12–24.
Baker, F. B. (2001). The Basics of Item Response Theory . College Park, Md: ERIC
Bock, R. D., & Aitkin, M. (1981). Marginal Maximum Likelihood Estimation Of Item Parameters:. Psychometrika, 46(4), 443–459.
Burns, Anne, & H. J. (1997). Focus on Speaking. Sydney: National center for English Language Teaching and Research.
Chalmers, A. P., & Chalmers, M. P. (2012). Mirt: A multidimensional Item Response Theory Package For The R Environment. Journal of Statistical Software.
D.S, S. (2006). Pengaruh Panjang Tes dan Ukuran Sample Terhadap Kekekaran Estimasi Parameter Pada Teori Responsi Butir (Item Response Theory). In Cakrawala Pendidikan 25(3): 431–452.
Dameria Sinaga. (2019). Pembelajaran Evaluasi Pengukuran.
Desjardins, C.D., & Bulut, O. (2018). Handbook of Educational Measurement and Psychometrics Using R (Vol. 4). Florida: CRC Press.
Falani, I., & Kumala, S. A. (2017). Kestabilan Estimasi Parameter Kemampuan Pada Model Logistik Item Response Theory Ditinjau Dari Panjang Tes. SAP (Susunan Artikel Pendidikan), 2(2). Retrieved from https://doi.org/10.30998/sap.v2i2.2028
Hambleton, R. K., Swaminathan, H., & Rogers, H. J. (1991). Fundamentals Of Item Response Theory Library. Newbury Park: SAGE Publivations.
Hambleton R.K. & Swaminathan H. (1985). Items Response Theory: Principles and Application. Boston, MA: Kluwer-Nijjhoff Publish.
Handscomb, J. M. H. & D. C. (1975). Monte Carlo Methods. London: Halsted Press.
Hidayatuloh, A. (2021). Pengantar R dan RStudio. Retrieved from https://bookdown.org/aepstk/intror/intro.html#rstudio
Hula, I. R. N. (2021). Evaluasi dan Tes Bahasa Arab. Language Development Center, 12(7), 0–11.
Jannah, S. N. (2019). Pengaruh Media Kereta Huruf Terhadap Kemampuan Literasi Anak Kelompok B Paud Babussalam Pandean Durenan Trenggalek. 17–56. http://repo.iain-tulungagung.ac.id/id/eprint/12487
Jumailiyah, M. (2017). Item Response Theory: A basic concept. Educational Research and Reviews, 12(5), 258–266. Retrieved from https://doi.org/10.5897/err2017.3147
Kane, A. S. C. & M. T. (2007). The Precision of Simulation Study Results. Applied Psychological Measurement Journal, 25(2), 136–145.
Kim, I. (2013). A Comparison Of A Bayesian and Maximum Likelihood Algorithms For Estimation Of A Multilevel IRT Model. Journal of Chemical Information and Modeling, 53(9), 1689–1699.
Kusrini & Luthfi, E. T. (2009) Algoritma Data Mining. Yogyakarta: Penerbit Andi.
Lê, S., Josse, J., & Husson, F. (2008). FactoMineR: An R Package For Multivariate Analysis. Journal of Statistical Software, 25(1), 1–18. Retrieved from https://doi.org/10.18637/jss.v025.i01
Lord, F. (1980). Estimating Ability and Item Parameters. New Jersey: Lawrence Erlbaum Associates Publishers.
Mahmud, J., Sutikno, M., & Naga, D. (2016). Variance Difference between Maximum Likelihood Estimation Method and Expected A Posteriori Estimation Method Viewed from Number of Test Items. Educational Research and Reviews, 11(16), 1579–1589. Retrieved from https://doi.org/10.5897/ERR2016.2807
Michael Harwell, Clement A. Stone, Tse-Chi Hsu, & L. K. (1996). Monte Carlo Studies in Item Response Theory. In Applied Psychological Measurement 20(2), 101–125. Retrieved from https://doi.org/10.1177/014662169602000201
Paek, I., & Cole, K. (2020). Using R For Item Response. New York: CRC Press Taylor & Francis Group.
Pendidikan, B. S. N. (2006). Badan Standar Nasional Pendidikan (Badan Stan).
Rasch G. (1973). On General Laws And The Meaning Of Measurement In Psychology. In Proceedings of the Fourth Berkeley Symposium on Mathematical Statistics and Probability 15(2). Retrieved from fromhttps://doi.org/10.2307/1267010
Retnawati, H. (2014). Teori Respons Butir dan Penerapannya: Untuk Peneliti, Praktisi Pengukuran dan Pengujian, Mahasiswa Pascasarjana. Yogyakarta: Nuha Medika.
Retnawati, H. (2015). Perbandingan Estimasi Kemampuan Laten Antara Metode Maksimum Likelihood dan Metode Bayes. Jurnal Penelitian Dan Evaluasi Pendidikan, 19(2), 145–155. Retrieved from https://doi.org/10.21831/pep.v19i2.5575
Rizopoulos, D. (2006). Itm: An R Package For Latent Variable Modeling and Item Response Theory Analysis. Journal of Statistical Software, 17(5), 1–25. https://doi.org/10.18637/jss.v017.i05
Ronald E. Walpole, Raymond H, Mayers, S. L. M. (2002). Probability & Statistics For Engineers and Scientists. New Jersey: Prentice-Hall.
Ross Ihaka and Robert Gentleman. (1996). R: A Language For Data Analysis and Graphics. Journal of Computational and Graphical Statistics, 5(3), 299–314.
Parameters In Item Response Theory. Kuram ve Uygulamada Egitim Bilimleri, 17(1), 321–335. Retrieved from https://doi.org/10.12738/estp.2017.1.0270
Wickham, Haddley. (2009). Ggplot2 Elegant Graphics for Data Analysis Second Edition (Vol. 35). http://www.springer.com/series/6991
Wickham, Hadley. (2007). Reshape_Rmanual.pdf. Journal of Statistical Software, 21(12), 1–20. http://www.jstatsoft.org/v21/i12/
Willse, J. T. (2018). Package “CTT”: Classical Test Theory Functions. 20. Retrieved from https://cran.r-project.org/web/packages/CTT/CTT.pdf
Yen, W. M. (1981). Using Simulation Results To Choose A Latent Trait Model. Applied Psychological Measurement, 5(2), 245–262. Retrieved from https://doi.org/10.1177/014662168100500212
Published
How to Cite
Issue
Section
Citation Check
License
Copyright (c) 2025 REID (Research and Evaluation in Education)

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
The authors submitting a manuscript to this journal agree that, if accepted for publication, copyright publishing of the submission shall be assigned to REID (Research and Evaluation in Education). However, even though the journal asks for a copyright transfer, the authors retain (or are granted back) significant scholarly rights.
The copyright transfer agreement form can be downloaded here: [REID Copyright Transfer Agreement Form]
The copyright form should be signed originally and sent to the Editorial Office through email to reid.ppsuny@uny.ac.id
REID (Research and Evaluation in Education) by http://journal.uny.ac.id/index.php/reid is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.