Content validity analysis of literacy assessment instruments

This research is part of the development of teacher assessment literacy instruments. The research objective at this stage was to determine the content validity of the instrument. The instruments developed were 25 multiple-choice items constructed from Assessment Objectives, Measurement Theory, Assessment Process, and Assessment Fairness. Then the instrument is assessed qualitatively through a Focus Group Discussion to collect various suggestions and input from various points of view. After that the instrument was assessed quantitatively by 3 experts with 6 aspects of assessment, namely the relevance of the item to the assessment literacy dimension, relevance of items with indicators, clarity of the main items, logical answer options, standardization of language, and the functioning of descriptions/ cases/ pictures/ tables on items. The collected data were analyzed according to the type of data. Qualitative data analyzed using a qualitative approach, while quantitative data were analyzed using Conten Validity Index (CVI). The results showed that of the 25 items, each had a CVI value of more than 0.60 with an average of 0.96. So it can be concluded that the items and instruments of assessment literacy have valid criteria based on content


INTRODUCTION
Assessment is the process of collecting and processing information to measure the achievement of student learning outcomes.The ability to use knowledge when carrying out an assessment is called assessment literacy.Therefore, assessment literacy involves cognitive and skills abilities when carrying out assessments.Cognitive abilities play a role in absorbing knowledge about the concept of assessment, which is obtained when reading.Then the knowledge that has been obtained will be internalized into skilled actions when carrying out the assessment.This is in accordance with the explanation of Brookhart (2011) and Popham (2013) which states that assessment literacy is the skills and knowledge of teachers to measure and support student learning through the implementation of assessments.
Assessment literacy involves the ability to construct reliable assessments and then administer and score these assessments to facilitate valid instructional decisions (Popham, 2013;Stiggins, 2004).Assessment literacy describes the teacher's ability to plan for, administer, understand and apply the outcomes of assessments accurately and efficiently (Boyle 2005;Stiggins 1999;Stoynoff and Chapelle, 2005).In addition, Assessment literacy can empower the teachers to use data collected from various assessment methods, interpret it properly and make their instruction better (Gotch, 2012).
A teacher must have good assessment literacy.This is because through good assessment, it will help teachers overcome learning problems, measure the effectiveness of the learning process, and control the achievement of learning objectives.This is in accordance with the explanation of Brookhart (2011) and Popham (2013) which states that assessment literacy can measure and support student learning.Furthermore, Shams & Iqbal (2019) states that assessment literacy has benefits for developing teacher professional competence, improving the welfare of teachers, students, and institutions, helping teachers overcome learning problems in class, and can improve the quality of learning.In addition, assessment literacy aims to develop teacher professional competence, determine the success of student achievement, develop teacher abilities in applying assessment methods according to student character, and develop students' academic abilities and potential (Johnston & Costello, 2005;Zolfaghari & Ahmadi, 2016).All these benefits greatly affect the quality of education.
But behind the importance of the assessment literacy described above, there are various problems faced by teachers.DeLuca, et al (2016) stated that the low understanding and application of teacher assessments was caused by teachers lacking competency in interpreting, integrating, and communicating assessment results.The same thing happened to teachers in Indonesia.Many teachers do not understand the concept of assessment and its function, so that the application is not optimal.When the teacher conducts an assessment, it is only limited to making measurements without being accompanied by a comprehensive interpretation and without communicating the results of the assessment to related parties.
Based on interviews and observations in several schools, many teachers do not know the various ways of learning assessment.They only know that assessing means giving test questions to students, then analyzing them based on the correct answers.In addition, there are teachers who do not understand the steps in developing learning assessments such as making assessment designs, developing items, making scoring and making assessment guidelines.The majority of them directly took questions from other sources without any analysis of the assessment stages.The teacher is also still confused in analyzing the results of the assessment in accordance with the applicable assessment norms.Then if there are description questions, they give scores based on their own assumptions without a standard rubric.Of course, these problems have an impact on the low quality of education.Van de Grift (2007); Levi & Inbar-Lourie (2020) stated that the low literacy assessment of teachers will have an impact on decreasing the quality of teaching, student activity, and learning achievement.
The problems described above must be corrected immediately so that the quality of education is guaranteed and can even increase.One solution is through an assessment literacy diagnostic test which can be used as a basis for making policies to improve weaknesses related to assessment literacy.So that the test results can describe the real situation, a valid and reliable instrument is needed to measure assessment literacy.These instruments can be made through a development process based on standards and dimensions of teacher assessment literacy.
Developing an assessment instrument must be based on the dimensions of the assessment literacy (O'Loughlin, 2013;Taylor, 2009;Koh, 2011).Assessment literacy standards include choosing an assessment method, developing assessment methods, administering, assigning, and interpreting learning outcomes, using assessment outcomes in decision making, using assessment to determine levels of learning outcomes, communicating assessment outcomes, and knowing unethical practices (Yamtim & Wongwanich, 2014).In addition, DeLuca et al (2015) explained that assessment literacy indicators are: 1) assessment objectives which are the process of selecting the appropriate form of assessment based on the instructional objectives of the assessment, for example: diagnostic tests, formative tests, summative tests, and selection tests; 2) measurement theory which is an understanding of the psychometric nature of assessment, for example: validity test and reliability test; 3) the assessment process, which is the ability to compile, manage, evaluate, and interpret the results of the assessment.At this stage, it requires the ability to communicate the objectives, processes and results of the assessment to stakeholders such as students, parents and other parties; and 4) assessment fairness, which is the ability to give fair assessments to students based on the ethics of assessment.The ethics of assessment in question is to disclose accurate information on the results of assessments by protecting students' rights and privacy.
Research on assessment literacy has been carried out by DeLuca, et al (2015) which resulted in an analysis of assessment literacy standards for schoolteachers and university lecturers.And continued again by DeLuca, et al (2016) who produced instruments to support teacher assessment literacy with a class assessment inventory approach.Furthermore, research conducted by Zolfaghari and Ahmadi (2017) resulted in an analysis of the teacher's assessment of literacy skills obtained through interviews.As well as Ashraf and Zolfaghari (2018) which produced a relationship between EFL Teachers' Assessment Literacy and Their Reflective Teaching.Based on previous studies, there has been no research that has produced an assessment literacy test instrument.The majority of related studies only analyze assessment literacy using a questionnaire instrument, of course, will produce data that is less than ideal, therefore the purpose of this study is to produce an assessment literacy test instrument that has content valid criteria.

METHOD
The purpose of this study was to determine the content validity of the teacher assessment literacy instrument obtained through the product development process.Therefore, the research methodology used is the Research and Development Methodology with limitations on certain steps.Borg and Gall (1984) state that Research and Development (R and D) methodology is a process in developing and validating educational products.In addition, Richey and Client (2007) state that R and D is a systematic study in developing and evaluating the products created.Thiagarajan (1974) states that R and D consists of four stages, namely define, design, develop, and disseminate.Cohen and Swerdlik (2010) state that the test development process consists of 5 steps, namely conceptualizing the test, constructing the test, testing the test, conducting item analysis, and revising the test.Chadha (2009) states that there are five steps in constructing a test, namely planning the test, preparing the initial draft of the test, trying out the initial draft of the test, evaluating the test, and constructing the final draft of the test.In addition, Irwing and Hughes (2018) state that there are 10 steps in developing a test, namely definition constructs, test specifications, and test structure, overall planning, item development such as construct definition, item creation, item review, testing items, scale construction, reliability, validation, test scoring, specification test, implementation and test, and technical manual.

Figure 1. Research Procedure
Based on the description above, this research procedure was designed based on R and D theory and instrument development theory.However, in this study, the R and D steps were limited to the develop stage, namely the analysis of the content validity of the assessment literacy instrument.The procedure steps are presented in Figure 1.
Based on Figure 1 it can be explained that the procedure of this research is the first, analyzing the problem.The first step in this research is to analyze the problem.Based on interviews and observations of several teachers, it can be concluded that teacher assessment literacy has low criteria with some of the evidence that has been explained in the introduction section.
The second is determining research variables.After the problem is known, the second step is to determine the research variable, namely the assessment literacy variable.These variables are determined based on the problems found in the field.
The third is defining the literacy assessment concept.After determining the variable, the next step is to define the concept of the variable.The concept of assessment literacy is the skills and knowledge of teachers to measure and support student learning through the implementation of assessments.
The fouth is defining operational literacy assessment.The operational definition of assessment literacy is the teacher's skills and knowledge to measure and support student learning through the implementation of assessments, which are measured through tests with dimensions including assessment objectives, measurement theory, assessment process, and assessment fairness.
The fifth is determining the purpose of the test.The purpose of the test is to diagnose teacher assessment literacy skills, which is used as a basis for making policies in an effort to improve teacher assessment literacy problems.
The sixth is designing blueprint design.Mardapi (2008) stated that the steps for making a blueprint design include writing general objectives, making a list of topics, determining indicators, and determining the number of questions.
The seventh is determining the type and length of the test.The type of test used in this study is a multiple-choice test, with the reason that it can cover a wide range of material and a short processing time.While the length of the test is obtained through an analysis of the estimated time for each question.Nitko (1996) estimates the time to work on analysis-type questions is 2-5 minutes for each question.
The eighth is writing items.Mardapi (2008) states that the main guidelines for making multiple-choice tests include: the subject matter must be clear, the sentences used are appropriate to the developmental level of the test takers, the language used is standard, the location of the correct answer choices is determined randomly, all answer choices are logical, length the answer choice sentences are relatively the same, and there are no clues to the correct answer.
The ninth is focus group discussion.Focus Group Discussion is a qualitative assessment activity of each item of the assessment literacy instrument.Qualitative assessment in the form of repair notes on item construction.Of course, in accordance with the 6 aspects of the assessment that have been described previously.Referring to the notes, the instrument was repaired and given to experts to be assessed quantitatively.
The tenth is analyze content validity.Content validity was determined through an assessment by experts of the instrument.Goodwin and Leech (2003) stated that content-based validity is based on logical analysis and expert evaluation of content measurements such as: subject matter, item format, and constituent sentences.This assessment is a quantitative assessment by giving a score ranging from 1 to 4 for each aspect of the assessment.
The resulting data is then analyzed according to the data type.Qualitative data analyzed using a qualitative approach, while quantitative data were analyzed using Conten Validity Index (CVI).The description of the data analysis is presented in table 1.

Findings
The literacy assessment instrument developed in this study was used to diagnose the teacher's initial ability to assess literacy in learning.This instrument was developed with a total of 25 items.These items were developed referring to the dimensions of assessment literacy, such as assessment purpose, assessment process, assessment fairness, and measurement theory.The summary is presented in table 2. Referring to Table 2, then the type and length of the test are determined.The type of test used in this study is a multiple-choice test, with the reason that it can measure a wide range of material and a short processing time.Then the length of the test is determined through an analysis of the estimated processing time for each question presented in table 3. Based on table 3, it can be explained that 25 multiple choice items can be completed within 50 -125 minutes.On that basis, the length of the assessment literacy test is 60 minutes.
After the blue print, the type of test, and the length of the test are determined, the next step is to write instrument items.Writing literacy assessment items refers to aspects, such as: 1) the relevance of the item to the quantitative literacy dimension, 2) the relevance of the item to the question indicators, 3) the clarity of the subject matter, 4) the logic of all answer options, 5) the standardization of the language used, and 6 ) the functioning of case descriptions/discourses/pictures/tables/graphics in the item.Writing this item will produce a draft literacy assessment instrument.
The draft instruments that have been developed are then assessed qualitatively through Focus Group Discussion (FGD) activities, involving teachers and experts.The purpose of a qualitative assessment is to collect various suggestions and input from various points of view.The following is a description of the results of the assessment:

Suggestions and improvements to item number 1
The results of the FGD concluded that item number 1 did not measure the indicator item.The resulting recommendations are items replaced with other items.Evidence of this assessment is presented in Figure 2.

Figure 2. Suggestions for Improving Item Number 1
Based on Figure 2, it was then revised with the results presented in Figure 3.

Suggestions and Improvements to Item Number 2
The results of the FGD concluded that there was no relationship between the questions and the answer options.Questions contain types of assessment, while answer options contain various forms of response.This situation will cause item bias.The resulting recommendations are revised Cakrawala Pendidikan: Jurnal Ilmiah Pendidikan, Vol. 42 No. 2, June 2023, pp.447-459 questions and adapted to the answer options.Evidence of these recommendations is presented in Figure 4.

Suggestions and Improvements to Item Number 3
The results of the FGD concluded that the main item information was incomplete, causing item bias.The resulting recommendations are to complete the main item information so that there is conformity with the answer options.Evidence of these recommendations is presented in Figure 6.

Figure 6. Suggestions for Improving Item Number 3
Based on Figure 6, it was then revised with the results presented in Figure 7.The results of the FGD concluded that the symbol for inequality in answer options B and C needs to be replaced.Evidence of recommendations is presented in Figure 8.

Suggestions and Improvements to Item Number 9
The results of the FGD concluded that answer options D and E in item number 9 were correct.The resulting recommendation is that one of the answer options is changed to be false according to Figure 10.The results of the FGD concluded that the information on the main items was incomplete, so that there were no correct answer options.The resulting recommendations are the subject matter added to the information in accordance with Figure 12.Based on Figure 12, it was then revised with the results presented in Figure 13.

Figure 13. Revision Result of Item Number 10
After the instrument has been assessed qualitatively and revised, the next step is to provide a draft of the instrument to 3 experts to be assessed quantitatively.The purpose of this quantitative assessment is to determine the content validity of the teacher's assessment literacy instrument.Quantitative assessment is carried out by giving a score between 1 to 4 on each aspect of the assessment based on the question card.These assessment aspects include the relevance of the item to the assessment literacy dimension, the relevance of items with indicators, clarity of the main items, logical answer options, standardization of language, and the functioning of descriptions/ cases/ pictures/ tables on items.While the question card guidelines are presented in Figure 14.

Figure 14. Example of Question Cards
The results of the quantitative assessment are presented in table 4. Based on table 4, it can be explained that all items have a CVI value of more than 0.60, it can be concluded that all literacy assessment instrument items have content valid criteria.Then, if seen from the average CVI value, it has a value of 0.96 > 0.60, it can be concluded that the assessment literacy instrument also has content valid criteria.This means that the items that have been developed in this study have a construct that is in accordance with the dimensions and indicators of assessment literacy.

Discussion
Based on the research findings, the literacy assessment instrument developed was of the multiple-choice type with a total of 25 items and a test length of 60 minutes.These items were developed based on the construct of assessment literacy, which consists of 4 dimensions, namely the purpose of the assessment, the theory of measurement, the process of assessment, and the fairness of the assessment.In addition, items are also developed based on measuring indicators for each dimension, clarity of main items, logical answers, standardization of language, and the functioning of item descriptions such as cases/discourses/images/tables/ or graphs.DeLuca et al (2015) state that the dimensions of assessment literacy are assessment purpose, assessment process, assessment fairness, and measurement theory.
The selection of multiple-choice tests is based on the breadth of assessment literacy variables consisting of 4 dimensions and 8 item indicators.In order to be done in a relatively short time, a multiple-choice test was selected.This is in accordance with the opinion of Cohen and Swerdlik (2010) which states that this type of multiple-choice test can measure a broad range of material with a relatively short processing time.Then the length of the test is 60 minutes based on the estimated length of the test by Nitko (1996) which is presented in table 3.
Then the target number of items must have at least 12 items or 3 items for each dimension with valid and reliable criteria.Therefore, the questions are developed at least twice or more than the targeted questions, namely as many as 25 items.Neill (2011) which states that one domain has at least 3 items.Sumintono and Widhiarso (2015) stated that the number of items developed by researchers must be two or three times the target number of items, because if there are items that do not pass the selection, there are still other reserve items.
Items are developed according to the assessment literacy dimensions and item structure, such as: item clarity, logical answers, language standardization, and the functioning of question descriptions such as cases/discourses, pictures/tables/graphs.DeLuca et al (2015) state that the indicators of assessment literacy are the purpose of the assessment; assessment process; fairness of judgment; and measurement theory.Mardapi (2008) states that the main guidelines for making multiple choice tests include: the subject matter must be clear, the sentences used are appropriate to the level of development of the test takers, the language used is standard, the location of the correct answer choices is determined randomly, all answer choices are logical, the sentence length of the answer choices is relatively the same, there is no hint of the correct answer.
Instrument draft, then analyzed for content validity by 3 experts.Goodwin and Leech (2003) state that content-based validity is based on logical analysis and expert evaluation of item constructs, such as: item subject matter, item format, and constituent sentences.Furthermore, the item construct as described above, refers to the opinion of Mardapi (2008: 93) which states that the main guidelines in making multiple-choice tests include: the subject matter must be clear, the sentences used are appropriate to the level of development of the test takers, the language used is standard, the location of the answer choices correct answer is determined randomly, all answer choices are logical, the sentence length of the answer choices is relatively the same, there are no clues to the correct answer.The number of experts is in accordance with the opinion of Lynn (1986) which states that the number of experts used in expert validation is at least 3 experts and no more than 10 experts.
Then the expert's assessment data were analyzed using the CVI (Content Validity Index) with the result that all items had an I-CVI value ≥ 0.60, so all items had content valid criteria.As for the overall CVI score, it is 0.96 ≥ 0.60, so the instrument has content-valid criteria.Polit and Beck (2006) state that CVI is a calculation of the proportion of items that get a score of 3 or 4 from experts.Rempusheski and O'Hara (2005) state that the recommended proportion of CVI ranges from 0.60 to 1.0.

CONCLUSION
Based on the research results, it can be explained that the value of the Content Validity Index (CVI) for each item of the quantitative literacy instrument exceeds 0.60.Meanwhile, the Content Validity Index (CVI) value for the literacy assessment instrument is 0.96 which exceeds 0.60.So it can be concluded that the items and literacy assessment instruments that have been developed have valid criteria in terms of content.This means that the constructs of the items and instruments are in accordance with the theoretical constructs used.

Figure 3 .
Figure 3. Revision Result of Item Number 1

Figure 4 .
Figure 4. Suggestions for Improving Item Number 2Based on Figure4, it was then revised with the results presented in Figure5.

Figure 5 .
Figure 5. Revision Result of Item Number 2

Figure 7 .
Figure 7. Revision Result of Item Number 3 Suggestions and Improvements to Item Number 5The results of the FGD concluded that the symbol for inequality in answer options B and C needs to be replaced.Evidence of recommendations is presented in Figure8.

Figure 10 .
Figure 10.Suggestions for Improving Item Number 9

Figure
Figure 11.Revision Result of Item Number 9

Figure 12 .
Figure 12.Suggestions for Improving Item Number 10