Comparison of methods for detecting anomalous behavior on large-scale computer-based exams based on response time and responses

Deni Hadiana, Universitas Negeri Jakarta, Indonesia
Bahrul Hayat, Universitas Islam Negeri Syarif Hidayatullah Jakarta, Indonesia
Burhanuddin Tola, Universitas Negeri Jakarta, Indonesia


This study aims to determine the anomalous index (indeks anomali or IA) that considers both response time and responses and compares it with response time effort (RTE) or rapid guessing (tebakan cepat or TC) on various thresholds. Response time and responses from 732 examinees are in natural science subjects consist of 40 multiple choice items with four answer choices. Response time and responses are analyzed to obtain descriptive statistics related to them, calculate the TC and IA index using two methods of the threshold, the first method (M1) is a visualization of identification, and the second method (M2) is based on the amount of time spent responding to each item related to the complexity of items, as proposed by Nitko. The performance of the IA and TC scores is compared related to validity and reliability. The coefficient alpha of IAM1 score 0.84, the coefficient alpha of IAM2 0.82. Both values of the alpha coefficient have fulfilled the reliability requirements of the index determination. The IA proposed in this study has a high correlation with ERP, which is commonly used to determine the solution behavior's magnitude and rapid guessing. The correlation value of IAM1 with TCM1 0.86, the correlation value of IAM2 with TCM2 0.89, and this high correlation value shows that there is a strong relationship between IA and TC. Determination of threshold time uses three categories of multiple choices item that reveal IA and TC distributions that are close to normal distribution so that it reflects natural empirical conditions.


anomalous index (IA); rapid guessing (TC); threshold; reliability; validity

Full Text:



Cizek, G. J., & Wollack, J. A. (2016). Handbook of quantitative methods for detecting cheating on tests (1st ed.). Routledge.

Fox, J.-P., Entink, R. K., & Van der Linden, W. (2007). Modeling of responses and response times with the package cirt. Journal of Statistical Software, 20(7).

Georgiadou, E., Triantafillou, E., & Economides, A. A. (2006). Evaluation parameters for computer-adaptive testing. British Journal of Educational Technology, 37(2), 261–278.

Hauser, C., & Kingsbury, G. G. (2009, November 4). Individual score validity in a Modest-Stakes adaptive educational testing setting [Paper presentation]. The Annual Meeting of the National Council on Measurement in Education, Sandiego, CA.

Kong, X. J., Wise, S. L., & Bhola, D. S. (2007). Setting the response time threshold parameter to differentiate solution behavior from Rapid-Guessing behavior. Educational and Psychological Measurement, 67(4), 606–619.

Lee, Y.-H., & Chen, H. (2011). A review of recent response-time analyses in educational testing. In Psychological Test and Assessment Modeling (Vol. 53, Issue 3).

Lewis, C., Lee, Y.-H., & Davier, A. A. Von. (2014). Test security for multistage tests: A quality control perspective. In N. Kingston & A. Clark (Eds.), Test Fraud (Statistical Detection and Methodology) (1st ed.). Routledge.

Van der Linden, W. J. (2006). A lognormal model for response times on test items. Journal of Educational and Behavioral Statistics, 31(2), 181–204.

Lindsey, J. K. (2004). Statistical analysis of stochastic processes in time. Cambridge University Press.

Marianti, S., Fox, J.-P., Avetisyan, M., Veldkamp, B. P., & Tijmstra, J. (2014). Testing for aberrant behavior in response time modeling. Journal of Educational and Behavioral Statistics, 39(6), 426–451.

Meijer, R.R., & Sotaridona, L. (2006). Detection of advance item knowledge using response times in computer adaptive testing (LSAC research report series No. CT 03-03). Law School Admission Council.

Meijer, Rob R. (1996). Person-Fit research: An introduction. Applied Measurement in Education, 9(1), 3–8.

Meijer, Rob R. (2003). Diagnosing item score patterns on a test using item response theory-based person-fit statistics. Psychological Methods, 8(1), 72–87.

Naga, D. S. (2013). Teori sekor pada pengukuran mental (2nd ed.). Nagarami Citrayasa.

Widiatmo, H., & Wright, D. B. (2015, April). Comparing two item response models that incorporate response times [Paper presentation]. National Council on Measurement in Education Annual Meeting, California, Illionis, USA.

Wise, S. L. (2006). An investigation of the differential effort received by items on a Low-Stakes Computer-Based Test. Applied Measurement in Education, 19(2), 95–114.

Wise, S. L., & Kong, X. (2005). Response time effort: A new measure of examinee motivation in Computer-Based Tests. Applied Measurement in Education, 18(2), 163–183.

Wulansari, A. D. (2019). Model logistik dalam IRT dengan variabel random waktu respon untuk tes terkomputerisasi [Doctoral Dissertation, Universitas Negeri Yogyakarta]. Eprints UNY.

Wulansari, A. D., Kumaidi, & Hadi, S. (2019). Two parameter logistic model with Lognormal Response Time for Computer-Based Testing. International Journal of Emerging Technologies in Learning (IJET), 14(15), 138–158.



  • There are currently no refbacks.

Creative Commons License
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

Find REID (Research and Evaluation in Education) on:


ISSN 2460-6995 (Online)

View REiD Visitor Statistics