Stability of estimation item parameter in IRT dichotomy considering the number of participants

Zulfa Safina Ibrahim, Universitas Negeri Yogyakarta, Indonesia
Alfred Irambona, Burundi University, Burundi
Beatriz Eugenia Orantes Pérez, El Colegio de la frontera sur (ECOSUR), Mexico

Abstract


This research is related to item response theory (IRT) which is needed to measure the goodness of a test set, while item parameter estimation is needed to determine the technical properties of a test item. Stability of item parameter estimation is conducted to determine the minimum sample that can be used to obtain good item parameter estimation results. The purpose of this study is to describe the effect of the number of test takers on the stability of item parameter estimation with the Bayes method (expected a posteriori, EAP) on dichotomous data. This research is an exploratory descriptive research with a bootstrap approach using the EAP method. The EAP method is performed by modifying the likelihood and function to include prior information about the participant's 9 score. Bootstrapping on the original data is done to take bootstrap samples. with ten different sample sizes of 100, 150, 250, 300, 500, 700, 1,000, 1,500, 2,000, 2,500 were then replicated ten times and grain parameter estimation was performed. Each sample data with ten replications was calculated Root Mean Squared Difference (RMSD) value. The results showed that the 2PL model was chosen as the best model. The RMSD value obtained proves that many test participants affect the stability of item parameter estimation on dichotomous data with the 2PL model. The minimum sample to ensure the stability of item parameter estimates with the 2PL model is 1,000 test participants.


Keywords


stability; item parameter estimation; item response theory; EAP; bootstrapping

References


Akour, M., & Al-Omari, H. (2013). Empirical investigation of the stability of irt item-parameters estimation. International Online Journal of Educational Sciences, 5(2): 291–301.

Ayala, R. J. de. (2009). The theory and practice of item response theory. New York, NY: Guilford Press.

Baker, F. B. (2001). The basics of item response theory. College Park, Md: ERIC

Bock, R. D., & Aitken, M. (1981). Marginal maximum likelihood estimation of item parameters: Application of an EM algorithm. Psychometrika, 46: 443-459.

Chalmers, A. P., & Chalmers, M. P. (2012). Mirt: A multidimensional item response theory package for the R environment. Journal of Statistical Software.

Chapman, R. (2022). Expected a posteriori scoring in PROMIS®. Journal of Patient-Reported Outcomes, 6(1): 59-79. https://doi.org/10.1186/s41687-022-00464-9

Custer, M. (2015). Sample size and item parameter estimation precision when utilizing the one-parameter “Rasch” model, Paper presented at the Annual Meeting of the Mid-Western Educational Research Association Evanston, Illinois October 21-24.

Davidson, R. & MacKinnon, J.G. (1993). Estimation and inference in econometrics. Oxford: Oxford University Press.

Desjardins, C. D., & Bulut, O. (2018). Handbook of educational measurement and psychometrics using R. New York: CRC Press Taylor & Francis Group.

Hambleton, R. K. (1989). Principles and selected applications of item response theory. In R. L. Linn (Ed.). New York, NY: Macmillan.

Hambleton, R. K., Swaminathan, H., & Rogers, H. J. (1991). Fundamentals of item response theory library. California: Sage Publication.

Harwell, M.R., Stone, C.A., Hsu, T.C., et al. (1996). Monte-carlo studies in item reponse theory. Aplied Psichological Measurement, 20: 101-125.

Hossain, M. Z. (2000). Bootstrapping - an introduction and its applications in statistics. Bangladesh Journal of Scientific Research, 18(1): 75–88.

Kassambara, A., & Mundt, F. (2020). Package ‘factoextra’: Extract and visualize the results of multivariate data analyses. CRAN- R Package, 84. Retrieved from https://cran.r-project.org/package=factoextra

Le, S., Josse, J., & Husson, F. (2008). FactoMineR: An R package for multivariate analysis. Journal of Statistical Software, 25(1).

Montgomery, C. (2017). What is the TOEFL test? why do you need it? • PrepScholar TOEFL. Retrieved from https://www.prepscholar.com/toefl/blog/what-is-toefl/

Mooney, C.Z. & Duval, R.D (1993). Bootstrapping: A nonparametric approach to statistical inference. Newbury Park: Sage Publication.

Paek, I., & Cole, K. (2020). Using R for item response. New York: CRC Press Taylor & Francis Group.

Retnawati, H. (2008). Estimasi efisiensi relatif tes berdasarkan teori respons butir dan teori tes klasik. Disertasi doktor. Universitas Negeri Yogyakarta, Yogyakarta.

Retnawati, H. (2014). Teori respons butir dan penerapannya: Untuk peneliti, praktisi pengukuran dan pengujian, mahasiswa pascasarjana. Yogyakarta: Nuha Medika.

Rizopoulos, D. (2006). Itm: An R package for latent variable modeling and item response theory analyses. Journal of Statistical Software, 17(5): 1–25. https://doi.org/10.18637/jss.v017.i05

Şahin, A., & Anıl, D. (2017). The effects of test length and sample size on item parameters in item response theory. Kuram ve Uygulamada Egitim Bilimleri, 17(1): 321–335. https://doi.org/10.12738/estp.2017.1.0270

Sinharay, S. (2015). Book review: Handbook of item response theory modeling: Applications to typical performance assessment. Applied Psychological Measurement, 39(6). https://doi.org/10.1177/0146621615590600

Stone, M., & Yumoto, F. (2004). The effect of sample size for estimating rasch/irt parameters with dichotomous items. Journal of Applied Measurement, 5(1): 48–61.

Swaminathan, H., Hambleton, R. K., Sireci, S. G., Xing, D., & Rizavi, S. M. (2003). Small sample estimation in dichotomous item response models: Effect of priors based on judgmental information on the accuracy of item parameter estimates. Applied Psychological Measurement, 27(1): 27–51. http://dx.doi.org/10.1177/0146621602239475

Uyigue, A. V., & Orheruata, M. U. (2019). Test length and sample size for item-difficulty parameter estimation in item response theory. Journal of Education and Practice, 10(30): 72–75. https://doi.org/10.7176/jep/10-30-08

Vernon, P. E., & Nunnally, J. C. (1965). Educational measurement and evaluation. British Journal of Educational Studies, 13(2): 212. https://doi.org/10.2307/3118345

Vinod, H.D. (1993). Bootstrap methods: Aplication in econometrics. New York: Elsevier Science Publisher.

Willse, J. T. (2018). Package “CTT”: Classical test theory functions. 20. https://cran.r-project.org/web/packages/CTT/CTT.pdf

Yen, W. M. (1981). Using simulation results to choose a latent trait model. Applied Psychological Measurement, 5(2): 245–262. https://doi.org/10.1177/014662168100500212




DOI: https://doi.org/10.21831/reid.v10i1.73055

Refbacks

  • There are currently no refbacks.




Creative Commons License
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.




Find REID (Research and Evaluation in Education) on:

  

ISSN 2460-6995 (Online)

View REiD Visitor Statistics