Gaining a deeper understanding of the meaning of the carelessness parameter in the 4PL IRT model and strategies for estimating it

Timbul Pardede, Universitas Terbuka, Indonesia
Agus Santoso, Universitas Terbuka, Indonesia
Diki Diki, Universitas Terbuka, Indonesia
Heri Retnawati, Universitas Negeri Yogyakarta, Indonesia
Ibnu Rafi, Universitas Negeri Yogyakarta, Indonesia
Ezi Apino, Universitas Negeri Yogyakarta, Indonesia
Munaya Nikma Rosyada, Universitas Negeri Yogyakarta, Indonesia


Three popular models are used to describe the characteristics of the test items and estimate the ability of examinees under the dichotomous IRT model, namely the one-, two-, and three-parameter logistic models. The three-item parameters are discriminating power, difficulty, and pseudo-guessing. In the development of the dichotomous IRT model, carelessness or upper asymptote parameter was proposed, which forms a four-parameter logistic (4PL) model to accommodate a condition where a high-ability examinee gives an incorrect response to a test item when he/she should be able to respond to the test item correctly. However, the carelessness parameter and the 4PL model have not been widely accepted and used due to several factors, and people’s understanding of that parameter and strategies for estimating it is still inadequate. Therefore, this study aims to shed light on ideas underlying the 4PL model, the meaning of the carelessness parameter, and strategies used to estimate that parameter based on the extant literature. The focus of this study was then extended to demonstrating practical examples of estimating item and person parameters using the 4PL model using empirical data on responses of 1,000 students from the Indonesia Open University (Universitas Terbuka) on 21 of 30 multiple-choice items on the Business English test, a paper-and-pencil test. We mainly analyzed empirical data using the ‘mirt’ package in RStudio. We present the analysis results coherently so that IRT users would have a sufficient understanding of the 4PL model and the carelessness parameter, and they can estimate item and person parameters under the 4PL model.


carelessness parameter; dichotomous IRT; four-parameter logistic model; item response theory

Full Text:



Adedoyin, O. O., & Mokobi, T. (2013). Using IRT psychometric analysis in examining the quality of junior certificate mathematics multiple choice examination test items. International Journal of Asian Social Science, 3(4), 992–1011.

Allen, M. J., & Yen, W. M. (1979). Introduction to measurement theory. Brooks/Cole.

Andrich, D. (2004). Controversy and the Rasch model: A characteristic of incompatible paradigms? Medical Care, 42(1), 7–16.

Antoniou, F., Alkhadim, G., Mouzaki, A., & Simos, P. (2022). A psychometric analysis of Raven’s colored progressive matrices: Evaluating guessing and carelessness using the 4PL item response theory model. Journal of Intelligence, 10(1), 1–14.

Baker, F. B., & Kim, S.-H. (2017). The basics of item response theory using R. Springer International Publishing.

Barnard-Brak, L., Lan, W. Y., & Yang, Z. (2018). Differences in mathematics achievement according to opportunity to learn: A 4PL item response theory examination. Studies in Educational Evaluation, 56(1), 1–7.

Barton, M. A., & Lord, F. M. (1981). An upper asymptote for the three-parameter logistic item-response model (pp. 1–8) [Technical Report]. Educational Testing Service.

Battauz, M. (2020). Regularized estimation of the four-parameter logistic model. Psych, 2(4), 269–278.

Birnbaum, A. (1968). Some latent trait models and their use in inferring an examinee’s ability. In F. M. Lord & M. R. Novick (Eds.), Statistical theories of mental test scores (pp. 397–424). Addison-Wesley.

Bulut, O. (2015). Applying item response theory models to entrance examination for graduate studies: Practical issues and insights. Journal of Measurement and Evaluation in Education and Psychology, 6(2), 313–330.

Chalmers, R. P. (2012). mirt: A multidimensional item response theory package for the R environment. Journal of Statistical Software, 48(6), 1–29.

Chalmers, R. P. (2023). Package “mirt.”

Chen, W.-H., & Thissen, D. (1997). Local dependence indexes for item pairs using item response theory. Journal of Educational and Behavioral Statistics, 22(3), 265–289.

Cheng, Y., & Liu, C. (2015). The effect of upper and lower asymptotes of IRT models on computerized adaptive testing. Applied Psychological Measurement, 39(7), 551–565.

Christensen, K. B., Makransky, G., & Horton, M. (2017). Critical values for Yen’s Q3: Identification of local dependence in the Rasch model using residual correlations. Applied Psychological Measurement, 41(3), 178–194.

DeMars, C. (2010). Item response theory: Understanding statistics measurement. Oxford University Press.

Desjardins, C. D., & Bulut, O. (2018). Handbook of educational measurement and psychometrics using R. CRC Press.

DiBattista, D., & Kurzawa, L. (2011). Examination of the quality of multiple-choice items on classroom tests. The Canadian Journal for the Scholarship of Teaching and Learning, 2(2), 1–23.

Doğruöz, E., & Arikan, Ç. A. (2020). Comparison of different ability estimation methods based on 3 and 4PL item response theory. Pamukkale University Journal of Education, 50(1), 50–69.

Drasgow, F., Levine, M. V., & Williams, E. A. (1985). Appropriateness measurement with polychotomous item response models and standardized indices. British Journal of Mathematical and Statistical Psychology, 38(1), 67–86.

Edwards, M. C., Houts, C. R., & Cai, L. (2018). A diagnostic procedure to detect departures from local independence in item response theory models. Psychological Methods, 23(1), 138–149.

Felt, J. M., Castaneda, R., Tiemensma, J., & Depaoli, S. (2017). Using person fit statistics to detect outliers in survey research. Frontiers in Psychology, 8, 1–9.

Georgiev, N. (2008). Item analysis of C, D and E series from Raven’s standard progressive matrices with item response theory two-parameter logistic model. Europe’s Journal of Psychology, 4(3).

Haladyna, T. M., & Rodriguez, M. C. (2013). Developing and validating test items. Routledge.

Haladyna, T. M., Rodriguez, M. C., & Stevens, C. (2019). Are multiple-choice items too fat? Applied Measurement in Education, 32(4), 350–364.

Hambleton, R. K., & Jones, R. W. (1993). Comparison of classical test theory and item response theory and their applications to test development. Educational Measurement: Issues and Practice, 12(3), 38–47.

Hambleton, R. K., & Swaminathan, H. (1985). Item response theory: Principles and applications. Springer Science+Business Media.

Hambleton, R. K., Swaminathan, H., & Rogers, H. J. (1991). Fundamentals of item response theory. Sage Publications.

Hattie, J. (1985). Methodology review: Assessing unidimensionality of tests and items. Applied Psychological Measurement, 9(2), 139–164.

Hooper, D., Coughlan, J., & Mullen, M. R. (2008). Structural equation modelling: Guidelines for determining model fit. Electronic Journal of Business Research Methods, 6(1), 53–60.

Houts, C. R., & Edwards, M. C. (2013). The performance of local dependence measures with psychological data. Applied Psychological Measurement, 37(7), 541–562.

Kalkan, Ö. K. (2022). The comparison of estimation methods for the four-parameter logistic item response theory model. Measurement: Interdisciplinary Research and Perspectives, 20(2), 73–90.

Kalkan, Ö. K., & Çuhadar, İ. (2020). An evaluation of 4PL IRT and DINA models for estimating pseudo-guessing and slipping parameters. Journal of Measurement and Evaluation in Education and Psychology, 11(2), 131–146.

Kubinger, K. D., Holocher-Ertl, S., Reif, M., Hohensinn, C., & Frebort, M. (2010). On minimizing guessing effects on multiple-choice items: Superiority of a two solutions and three distractors item format to a one solution and five distractors item format. International Journal of Selection and Assessment, 18(1), 111–115.

Liao, W.-W., Ho, R.-G., Yen, Y.-C., & Cheng, H.-C. (2012). The four-parameter logistic item response theory model as a robust method of estimating ability despite aberrant responses. Social Behavior and Personality: An International Journal, 40(10), 1679–1694.

Loken, E., & Rulison, K. L. (2010). Estimation of a four-parameter item response theory model. British Journal of Mathematical and Statistical Psychology, 63(3), 509–525.

Magis, D. (2013). A note on the item information function of the four-parameter logistic model. Applied Psychological Measurement, 37(4), 304–315.

Magno, C. (2009). Demonstrating the difference between classical test theory and item response theory using derived test data. The International Journal of Educational and Psychological Assessment, 1(1), 1–11.

Maydeu-Olivares, A., & Joe, H. (2006). Limited information goodness-of-fit testing in multidimensional contingency tables. Psychometrika, 71(4), 713–732.

Meijer, R. R., & Tendeiro, J. N. (2018). Unidimensional item response theory. In P. Irwing, T. Booth, & D. J. Hughes (Eds.), The Wiley handbook of psychometric testing: A multidisciplinary reference on survey, scale and test development (Vol. 1, pp. 413–443). John Wiley & Sons.

Merino-Soto, C., Angulo-Ramos, M., Rovira-Millán, L. V., & Rosario-Hernández, E. (2023). Psychometric properties of the generalized anxiety disorder-7 (GAD-7) in a sample of workers. Frontiers in Psychiatry, 14, 1–16.

Ogasawara, H. (2017). Identified and unidentified cases of the fixed-effects 3- and 4-parameter models in item response theory. Behaviormetrika, 44(2), 405–423.

Orlando, M., & Thissen, D. (2000). Likelihood-based item-fit indices for dichotomous item response theory models. Applied Psychological Measurement, 24(1), 50–64.

Orlando, M., & Thissen, D. (2003). Further investigation of the performance of S - X2: An item fit index for use with dichotomous item response theory models. Applied Psychological Measurement, 27(4), 289–298.

Paek, I., & Cole, K. (2020). Using R for item response theory model applications. Routledge.

Posit Team. (2023). RStudio: Integrated development environment for R (2023.6.0.421) [Computer software]. Posit Software, PBC.

Primi, R., Nakano, T. D. C., & Wechsler, S. M. (2018). Using four-parameter item response theory to model human figure drawings. Revista Avaliação Psicológica, 17(4), 473–483.

Quaigrain, K., & Arhin, A. K. (2017). Using reliability and item analysis to evaluate a teacher-developed test in educational measurement and evaluation. Cogent Education, 4(1), 1–11.

Rafi, I., Retnawati, H., Apino, E., Hadiana, D., Lydiati, I., & Rosyada, M. N. (2023). What might be frequently overlooked is actually still beneficial: Learning from post national-standardized school examination. Pedagogical Research, 8(1), 1–15.

Retnawati, H. (2014). Teori respons butir dan penerapannya: Untuk peneliti, praktisi pengukuran dan pengujian, mahasiswa pascasarjana. Nuha Medika.

Retnawati, H. (2016). Analisis kuantitatif instrumen penelitian. Parama Publishing.

Revelle, W. (2023). psych: Procedures for psychological, psychometric, and personality research (R package version 2.3.3) [Computer software]. Northwestern University.

Robitzsch, A. (2022). Four-parameter guessing model and related item response models. Mathematical and Computational Applications, 27(6), 1–16.

Rulison, K. L., & Loken, E. (2009). I’ve fallen and I can’t get up: Can high ability students recover from early mistakes in CAT? Applied Psychological Measurement, 33(2), 83–101.

Rupp, A. A., & Zumbo, B. D. (2004). A note on how to quantify and report whether IRT parameter invariance holds: When pearson correlations are not enough. Educational and Psychological Measurement, 64(4), 588–599.

Rutkowski, L., von Davier, M., & Rutkowski, D. (Eds.). (2014). Handbook of international large-scale assessment: Background, technical issues, and methods of data analysis. CRC Press.

Santoso, A., Pardede, T., Apino, E., Djidu, H., Rafi, I., Rosyada, M. N., Retnawati, H., & Kassymova, G. K. (2022). Polytomous scoring correction and its effect on the model fit: A case of item response theory analysis utilizing R. Psychology, Evaluation, and Technology in Educational Research, 5(1), 1–13.

Schober, P., Boer, C., & Schwarte, L. A. (2018). Correlation coefficients: Appropriate use and interpretation. Anesthesia & Analgesia, 126(5), 1763–1768.

Slocum-Gori, S. L., & Zumbo, B. D. (2011). Assessing the unidimensionality of psychological scales: Using multiple criteria from factor analysis. Social Indicators Research, 102(3), 443–461.

Waller, N. G., & Feuerstahler, L. (2017). Bayesian modal estimation of the four-parameter item response model in real, realistic, and idealized data sets. Multivariate Behavioral Research, 52(3), 350–370.

Willse, J. T. (2018). CTT: Classical test theory functions (R package version 2.3.3) [Computer software].

Yen, W. M. (1984). Effects of local item dependence on the fit and equating performance of the three-parameter logistic model. Applied Psychological Measurement, 8(2), 125–145.

Yen, Y.-C., Ho, R.-G., Laio, W.-W., Chen, L.-J., & Kuo, C.-C. (2012). An empirical evaluation of the slip correction in the four parameter logistic models with computerized adaptive testing. Applied Psychological Measurement, 36(2), 75–87.

Zanon, C., Hutz, C. S., Yoo, H., & Hambleton, R. K. (2016). An application of item response theory to psychological test development. Psicologia: Reflexao e Critica, 29(1), 1–10.



  • There are currently no refbacks.

Creative Commons License
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

Find REID (Research and Evaluation in Education) on:


ISSN 2460-6995 (Online)

View REiD Visitor Statistics