Estimation of ability and item parameters in mathematics testing by using the combination of 3PLM/ GRM and MCM/ GPCM scoring model
Bastari Bastari, Minsitry of Education and Culture, Indonesia
Abstract
Keywords
Full Text:
PDFReferences
Bastari, B. (2000). Linking multiple-choice and constructed-response items to a common proficiency scale (Unpublished doctoral dissertation). University of Massachusetts Amherst, USA. UMI Microform 9960735.
Berger, M. P. (1998). Optimal design of testswith dichotomous and polytomous items. Applied Psychological Measurement, 22(3), pp. 248-258.
Bock, R. D. (1972). Estimating item parameters and latent ability when responsesare scored in two or more nominal categories. Psychometrika,
(1) 29-51.
Cao, Y. (2008). Mixed-format test equating: Effects of test dimensionality and common item sets (Unpublished doctoral
dissertation). University of Maryland, Maryland USA.
Chon, K. H., Lee, W. C, & Anlsey, T. N. (2007). Assessing IRT model-data fit for mixed format tests. CASMA Research Report, Number 26
De Ayala, R. J. (1989). A comparison of the nominal response model and the three parameter logistic model in computerized adaptive testing. Educational and Psychological Measurement, 23(3), 789-805.
De Mars, C. E. (2008, March). Scoring multiple choice items: A comparison of IRT and classical polytomous and dichotomous methods. Paper presented at the annual meeting of the National Council on Measurement in Education, NewYork.
Donoghue, J. R. (1994). An empirical examination of the IRT information
of polytomously scored reading items under the generalized partial
credit model. Journal of Educational Measurement, 31(4) pp. 295-311.
Ercikan, K. et al. (1998). Calibration and scoring oftests with multiple choice and constructed-response item types. Journal of Educational Measurement, 35(2), pp. 137-154.
Garner, M., & Engelhard, Jr., G. (1999). Gender differences in performance on multiple-choice and constructedresponse Mathematics items. Applied Measurement in Education, 12, pp. 29-51.
Gierl, M. J., Wang, C., & Zhou, J. (2008). Using the attribute hierarchy method to make diagnostic inferences about examinees' cognitive skills in algebra on the SAT. Journal of Technology, Learning, and Assessment, 6(6).
Glasersfeld, E.von. (1982). An interpretation of Piaget‟s constructivism. Revue Internationale de Philosophie, 36, pp. 612–635.
Hagge, S. L. (2010). The impact of equating method and format representation of common items on the adequacy of mixedformat test equating using nonequivalent groups (Unpublished doctoral dissertation). University of Iowa, USA.
He, Y. (2011). Evaluating equating properties for mixed-format tests (Unpublished doctoral dissertation). University of Iowa, USA.
Hoskens, M. & De Boeck, P. (2001). Multidimensional componential item
response theory models for polytomous items. Applied Psychological Measurement, 25, pp. 19-37.
Jurich, D., & Goodman, J. (2009, October). A comparison of IRT parameter recoveryin mixed format examinations using PARSCALE and ICL. Poster session presentedat the Annual meeting of Northeastern
Educational Research Association, James Madison University.
Kennedy, P., & Walstad, W. B. (1997). Combining multiple-choice and
constructed response test scores: An economist‟s view. Applied Measurement in Education, 10, pp. 359- 375.
Kentucky Department of Education. (2008). Educational Planning and Assessment System (EPAS) College Readiness Standards and Program of Studies Standards Alignment Introduction [Digital edition version]. Retrieved from http://www.education.ky.gov/
Kinsey, T. L. (2003). A comparison of IRT and Rasch procedures in a mixed-item format test (Unpublished doctoral dissertation). University of North Texas, USA. UMI Microform 3215773.
Lau, C. A. & Wang, T. (1998, April). Comparing and combining dichotomous and polytomous items with SPRT procedure in computerized classification testing. Paper presented at the annual meeting of the American Educational Research Association, San Diego, CA.
Levine, M. V., & Drasgow, F. (1983). The relationship between incorrect
optionchoice and estimated ability. Educational and Psychological Measurement, 43, pp. 675-685.
Li, Y. H., Lissitz, R. W., & Yang, Y. N. (1999). Estimating IRT equating coefficients for tests with polytomously and dichotomously scored items. Paper presented at the annual meeting of the National Council on Measurement in Education, Montreal Canada.
Lukhele, R., Thissen, D., & Wainer, H. (1994). On the relative value of
multiple-choice, constructedresponse, and examinee-selected items on two achievement tests. Journal of Educational Measurement, 31, pp. 234-250.
Meng, H. (2007). A comparison study of IRT calibration methods for mixed-formattests in vertical scaling (Unpublished doctoral dissertation). University of Iowa, USA.
Reynolds, C. R., Livingston, R. B., & Willson, V. (2009). Measurement and assessment in education (2nd ed.). New York: Pearson Education, Inc.
Sadler, P. M. (1998). Psychomatric models of examinee conceptions in science: Reconciling qualitative studies and distractor-driven assessment
instruments. Journal of Research in Science Teaching, 35(3), pp. 265-296.
Si, C. B. (2002). Ability estimation under different item parameterization and scoring models (Unpublished doctoral dissertation). University of North Texas, USA.
Van Someren, M. W., Barnard, Y. F., & Sandberg, J. A. C. (1994). The think aloud method: A practical guide to modelling cognitive processes. London: Academic Press.
Susongko, P. (2009). Perbandingan keefektifan bentuk tes uraian dan testlet dengan penerapan ‘graded response model’ (GRM) [The comparison between the effectiveness of explanatory test and test let with the implementation of graded response model] (Unpublished doctoral dissertation). Yogyakarta State University, Yogyakarta.
Sykes, R. C., & Yen, W. M. (2000). The scaling of mixed-item-format tests with theone-parameter and twoparameter partial credit. Journal of Educational Measurement, 37, pp. 221-244.
Tall, D. O. et al. (2012). Cognitive develop-ment of proof. In ICMI 19: Proof and Proving in Mathematics Education. Springer. [Digital edition version]. Retrieved from http://homepages.warwick.ac.uk/staff/David.Tall/pdfs
Tang, K. L., & Eignor, D. R. (1997). Concurrent calibration of
dichotomously and polytomously scored TOEFL items using IRT
models. TEOFL Technical Report 13. Princeton, NJ: Educational Testing Service.
Thissen, D. M. (1976). Information in wrong responses to the raven
progressivematrices. Journal of Educational Measurement, 13(3), pp.
-214.
Thissen, D., & Steinberg, L. (1984). A response model for multiple choice items. Psychometrika, 49, 501-519.
Thissen, D. M., Steinberg, L., & Fitzpatrick, A. R. (1989). Multiple-choice
models: The distractors are also part of the item. Journal of Educationa
Measurement, 26(2), pp. 161-176.
Traub, R. E. (1993). On the equivalence of the traits assessed by multiple-choice andconstructed-response tests. In R.
E. Bennett, & W. C. Ward (Eds). Construction versus choice in cognitive
measurement (pp. 29-44). Hillsdale, NJ: Lawrenc Erlbaum Associates.
Wainer, H. & Thissen, D. M. (1993).
Combining multiple-choice and constructed response test scores:
toward a marxist theory of test construction. Applied Measurement in
Education, 6(2), pp. 103-118.
Wainer, H. (1989). The future of item analysis. Journal of Educational
Measurement, 26(2), pp. 191-208.
Wasis. (2009). Penskoran model partial credit pada item multiple true-false bidang fisika [Partial credit scoring model on multiple true-false items in physics field] (Unpublished professor dissertation). Yogyakarta State University, Yogyakarta.
DOI: https://doi.org/10.21831/reid.v1i1.4898
Refbacks
- There are currently no refbacks.