Development of an integrated ability assessment instrument in reaction rate material

: This research aimed to develop a valid and reliable integrated ability assessment instrument (IAAI) to measure the multiple representations and chemical literacy of senior high school students related to the reaction rates. The framework developed in this research was implemented based on a 4-D model (deﬁ ning, designing, developing, and disseminating). The integrated assessment consisted of ﬁ ve items of essay questions. The preliminary version of the IAAI was initially piloted on reviews by experts, and the paper-pencil test was given to a group of students. The obtained data were analyzed using Item Response Theory (IRT). In addition, Generalized Partial Credit Model (GPCM) model was conducted to examine the parameter quality, which estimated the test items. The content validity, construct validity, and quality of the items suggest that the integrated assessment ability instrument has a high score in validity and reliability. The analysis results of the instrument trial data indicate that the integrated abilities of students are in the medium category.


INTRODUCTION
The rapid development of science and technology requires an appropriate learning process to support students in exploring their abilities (Plush & Kehrwald, 2014, p. 2). One step that can be taken to prepare students to face global challenges is to improve the learning process quality. In the Indonesian National Education System Law (Law Number 20 of 2003 concerning National Education System, p. 3), it is stated that learning is a process of interaction between students, educators and learning resources in a learning environment. The learning process is divided into three sequential stages, which are planning, implementing, and evaluating the learning process. Assessment is the last agenda that becomes the spearhead in the learning process that provides information about students' knowledge in learning about things that are already known and still needed by students, and helps teachers in forming and directing students in the learning process (Giles & Earl, 2011, p. 25). The assessment instrument is an integral part of an assessment process in learning which includes tests and an assessment system (Juita, Zulva, & Edial, 2019, p. 101). However, the assessment used in some schools in Indonesia still measures the memorization aspect which results in the lack of students' competencies development monitoring during learning. Therefore, an instrument is needed to assess and train every skill to students in the fi eld of science, especially chemistry. Chemical concepts that are presented in various levels of The quality parameter of a nation's progress is greatly infl uenced by the level of literacy that is owned by students and the human resources in it. One supporting factor is the ability of chemical literacy. Chemical literacy has become an important issue to be discussed in recent decades, making students aware of the benefi ts of literacy, becoming a major goal for educators, scientists, and curriculum policy makers (Martinez-Hernandez, K., Ikpeze, C., & Kimaru, 2015, p. 8). National Institute for Literacy (1992), defi nes literacy as the ability of individuals to read, write, speak, count and solve problems based on the expertise level required in work, family and society. Schwartz, Ben-Zvi, and Hofstein (2006, p. 206) suggested that there are four aspects in the fi eld of chemical literacy, which are knowledge of chemical materials and scientifi c ideas, chemistry in context, higher learning skills, and aff ective aspects. Furthermore, strengthening of literacy in students has been proven to strengthen students' understanding of concepts so that they can be easily applied in their lives (Avikasari, Rukayah, & Indriayu, 2018, p. 231).
Item response theory (IRT) is a study of questions based on the test parameters and the ability of test takers (Rudner, 2001, p. 231). IRT has 3 measurement models, which are: one-parameter logistics model, two-parameter logistics model, and three-parameter logistics model. Item response theory (IRT) is a review of questions by using answer questions. IRT explains the relationship between opportunities to answer a problem correctly with the ability of the test takers underlying it in mathematical functions. The concept of item response theory is based on two postulates. First, the test participant's achievement in a problem can be predicted by a group of factors called latent trait or ability. Second, the relationship between the test participant's achievement on a problem with the underlying ability described by a monotonous ascending functions called Item Characteristic Function or item characteristic curves/ICC. There are three assumptions that have to be fulfi lled before continuing the analysis by modeling using item response theory, which are the unidimensional assumption, local independence and parameter invariance. Some theoretical response models in the form of politomus items that are often used by experts include the Graded Response Model (GRM), Partial Credit Model (PCM), and Generalized Partial Credit Model (GPCM).
GPCM is a common form of the PCM model, with a 2 PL approach for scoring politomus data. Various studies have been conducted to determine the accuracy of the use of these models. One of them is Dodd (1991, p. 6) who analyzes the GRM and GPCM models for attitude measurement (de Ayala, 1993, p. 233). These models are used to analyze teacher attitudes towards communication skills and analysis of non verbal reasoning abilities. In addition, the application of GPCM is suitable with the condition of scoring the response of test items in the form of description in the fi eld, the score of the points is given based on the number of steps that are answered correctly regardless of the sequence of steps.

METHOD
Students who participated in this study consisted of students in class XI IPA high school (for the measurement phase) and class XII (for the instrument trial phase). The trial phase of the integrated assessment instrument was conducted on 370 students of class XII IPA students in Pekanbaru City, Riau, Indonesia who had the highest, medium and lowest national exam scores. They are SMAN 8, SMAN 9, SMAN 10, SMAN 6, and SMAN 15. The choice of instrument trial subjects of 370 test takers due to the use of a relatively small population size would be able to produce an inaccurate parameter estimate value, even for a population size that was too small, it was not possible to conduct the parameter estimated value.
The items were constructed based on the contents of Chemistry Curriculum 2013 applied in Indonesia. In order to elicit the desired multiple representation and chemical literacy from the students, each item of indicator is designed in such way that provoked multiple representation and chemical literacy together and describes in Tabel 1.
Data collection process for the development of integrated assessment instruments was in accordance with the predetermined product preparation procedures based on reviews and recommendations from expert judgment. Data collection on the characteristics of integrated assessment instruments in this study was conducted through a trial of instruments involving 370 students of class XII IPA SMA in fi ve high schools in Pekanbaru City, Riau Province, Indonesia. Data collection techniques used in this study were in the form of tests and nontests, which included questionnaires (item validation sheets), interviews, and written tests.
Data analysis techniques used were qualitative and quantitative analysis. Qualitative analysis was used to describe the product development process. Meanwhile, quantitative analysis was used to determine the integrated instruments produced characteristics, where the data to be analyzed was the data of instrument validation and trial results. The data that had been obtained regarding the process of developing integrated assessment instruments was descriptive data that was described according to the product development stages undertaken. Qualitative analysis included consideration of content validity and constructs by experts. In addition, quantitative data analysis included an analysis of the instrument construct validation results, the results of an instrument trial that included the assumptions results of the item response theory assumptions and the results of the item characteristics, and the results of integrated capability measurements. The process of analyzing data validation and testing of instruments used the Winstep application (for test assumption), SPSS (for test assumption), and the R Program (for item parameter). While the process of analyzing data was the result of integrated capability measurement using Microsoft Excel application.

FINDING AND DISCUSSION
The expert judgment trial was aimed at producing valid instruments in terms of content, so that it can be used to measure chemical literacy skills and multiple chemical representations of students. Qualitative study was conducted before the trial of instruments and measurements involving expert lecturers and high school chemistry teachers. The aspects assessed at this stage included the substance, construction, language and appearance of the initial product of the integrated assessment instrument. The results of the validation from the experts were then described in a qualitative descriptive way to be concluded in order to improve to the next stage. Suggestions that had been obtained from the results of expert studies on each item were improved to obtain a valid initial integrated assessment product. According to the evaluation results of the items obtained questions that are feasible to be used, there were 8 items to be tested with the time allocation of 2 hours of learning (90 minutes) on a small scale test. Meanwhile, fi ve valid items would be used to be tested in the measurement test phase.
The assumption test of item response theory was conducted before conducting the analysis using GPCM modeling with the 2-PL approach. The response model in item response theory has the meaning that the subject's probability of answering the item correctly depends on the subject's ability to answer the item correctly based on the subject's ability and item characteristics. This can be interpreted that students with high abilities will have a greater probability of answering correctly when compared to students who have lower abilities. There are three main assumptions that underlie item response theory, the three basic assumptions tests that have to be met are uni-dimensional assumptions, local independence, and parameter invariance (Hambleton & Swaminathan, 1985;Hambleton, Swaminathan, & Rogers, 1991).
Uni-dimensional assumption test was performed by exploratory factor analysis using the Statistical Package for the Social Science (SPSS) Version 17.0 program, taking into account the Eigen value and scree plot in the variance of the analysis results in the SPSS program. If the diff erence in Eigenvalues between components 1 and 2 is 4 or 5, it can be said that the test instrument measures a single or uni-dimensional ability. Meanwhile, the number of components measured in the test instrument is seen from an Eigenvalue which is equal to 1. The results of the factor analysis indicate the validity of a construct or empirical validity. However, there are requirements that have to be met before conducting a factor analysis, which are by testing a sample eligibility using the Kaiser-Mayer-Olkin Measure of Sampling Adequacy (KMO-MSA) test and the Bartlett test. Factor analysis conducted in this study uses exploratory factor analysis approach by looking at the value of KMO.
If the KMO value is lower than 0.50, then the exploratory factor analysis approach cannot be conducted (Yilmaz, Altinkurt, & Cokluk, 2011, p. 347). Meanwhile, the use of the Barlett test serves to determine whether or not there is a correlation between the variables of the instrument being developed. The KMO-MSA value obtained was 0.735 and the signifi cance value of the Barlett test <0.05. This indicates that the sample used was suffi cient and the exploratory factor analysis approach could be conducted for the next stage, which is the construct validity or unidimensional test. Furthermore, the uni-dimensional test assumption was conducted using factor analysis. Factor analysis aims to identify the relationship between variables by looking at the eigen value in the inter-covariant variance matrix. The results of the uni-dimensional assumption test analysis can be strengthened by the scree plot graph shown in Figure 1.
The screen plot is able to clarify the eigen value visualization with the number of components retained as a factor. Figure 1 shows the amount of steepness based on eigenvalues. According to the fi gure, the uni-dimensional assumption has been fulfi lled. Besides that, the second assumption that has to be fulfi lled is the local independence assumption. The analysis results of local independence tests can be calculated using a variance-covariance matrix of the person measure based on the students ability from each of the test instruments tested (Greiff et al., 2013, p. 368). Local independence test is considered fulfi lled if the value below the diagonal line in the variance-covariance matrix is zero. The zero value indicates that the students 'skills in answering one item does not aff ect the students' skills in answering another item. In other words, each item is independent. The third assumption is the parameter invariance assumption. The parameter invariance assumption test consists of item parameter and capability invariance. The results of the analysis can be concluded that students' abilities are invariant.
Item analysis in this study used the GPCM model (PCM -2 Parameter Logistic). The model is used to estimate the item parameters in the instrument which includes the item suitability, item diffi culty level, the test information function and the test reliability of the product that has been developed. Item analysis was conducted to determine items fi t with the test participant response pattern based on the GPCM model. The items developed were 8 items consisting of multiple representation and chemical literacy indicators. The item conformity index determines whether an item is functioning optimally and meets the requirements as a good measurement tool. The item is said to be fi t with the model if it meets the criteria for the value of means-square outfi t, z-standard outfi t and point measure correlation (Bond & Fox, 2015, p. 213).
Quantitative analysis was conducted through GPCM modeling using the Winstep program. The item calibration has been conducted to determine whether the items are proper Jurnal Kependidikan, 6(1), 1-13

Figure 1. Screen plot graph of uni-dimensional test analysis
with the GPCM or PCM-2PL model. According to Sumintono and Widhiarso (2015) if the test items meet two of the three criteria for acceptance of the MNSQ, ZSTD, or Pt Mean Corr limits, then the items can be considered as suitable for the model and fi t for further analysis. Outfi t-MNSQ (mean square) received outfi t values range from 0.5 to 1.5. Outfi t Z-STD received values range from -2.0 to +2.0. And the value of Pt Mean Corr had ranges from 0.4 to 0.85. The results of the conformity of the items with the partial credit model of two logistical parameters are presented in Table 2. According to Table 2, it appears that each item has fulfi lled two of the three criteria for acceptance of the MNSQ, ZSTD, or Pt Mean Corr limits. According to this it can be concluded that each item is fi t with the model, then all items can be said to be the fi nal product of integrated assessment ability instruments (IAAI) that have been empirically valid.
The items characteristics to be analyzed next are the diffi culty level parameters and the diff erent power of the questions. Diffi culty level parameters describe the estimated diffi culty level of an item stated in a logit. The item diffi culty level is one of the factors that aff ect the probability of students' answers in responding to certain items. Items that have a high degree of diffi culty require a high ability to be answered correctly, while items that considered easy have a low level of diffi culty which required a low ability to be answered correctly. Quantitative analysis is conducted through GPCM modeling using the Winstep program. The results of the item diffi culty level and the diff erent power with the GPCM model (partial credit 2 logistical parameters) are presented in Table 3.
According to the analysis of the items using the Winstep program, the items obtained were feasible to be used in the measurement test phase. It is due to all items have a good level of diffi culty, which ranges from -2.0 to +2.0 logit. The diffi culty level analysis results based on the R program output that have been obtained can be clarifi ed in Table 3. Hambleton and Swaminathan (1985) and Baker (2001), an item is considered good if it has a diffi culty index ranging from -2.0 to +2.0 logit. This indicates that the items in the integrated assessment instrument tend to have medium level questions. In addition, the results of the analysis of the diff erent power indexes described in Table 3 (symbolized by a), overall located at 0.323-0.692 indicate all items have good diff erences. This refers to the criteria for good items according to Ebel and Frisbie (1991) which states that the power parameters diff er (a) items with diff erent power ranges from 0.39 to 0.40 or > 0,25 have very good diff erence power. The items that have the highest diff erence in power are point 2 and the lowest is item 7. The analysis results of item diffi culy level is shown in Tabel 4. Therefore, based on the program output R of the diffi culty level parameter and the diff erent power of the items, all items in the integrated assessment instrument are included in the good category.
Characteristics of the items seen also based on the analysis results of the test information function. The test information function describes the accuracy of an item on the test instrument and illustrates how much the item's contribution in expressing the latent ability of students (latent trait) is measured using a test instrument that has been developed (Retnawati, 2014, p. 32). Latent abilities are hidden abilities that cannot be seen directly but have the potential to emerge. Gruijter and van der Camp (2014) state that the value of the test information function depends on latent skills. Therefore, the value of the information function will vary according to students' skills. The test information function is the sum of the item information functions in the test (Hambleton & Swaminathan, 1985, P. 335). Linn and Gronlund (Rosaroso, 2015, p. 372) state that generally reliability is defi ned as measurement consistency. Reliability provides consistency that makes the validity of an instrument said feasible. In this study, the reliability consistency approach is used in the estimation of reliability, which is conducted with one test given once to a group of subjects at the trial stage or empirical instrument validation. The advantage of this approach is practical and effi cient, because it is only conducted once. According to the results of data input on the Winstep program, reliability estimations and tests can be obtained directly. The item reliability value obtained was 0.95 (high) and person reliability was 0.69 (medium).
The instruments that have been tested on students in the instrument test phase produce instruments that are valid and reliable empirically, and have known the characteristics of the instrument through the previous stages. The next stage is the preparation of the instrument into the fi nal product that is ready to be trialed at the measurement stage to the subject, which are class XI IPA students in fi ve high schools in Pekanbaru in 2019 academic year. At this stage, measurement data obtained based on students' responses in answering items on the integrated assessment instrument are analyzed according to the assessment rubric that has been made.
The answer analysis is conducted by giving a value to each answer that has been written by students on the answer sheet that has been provided. The measurement results are then analyzed descriptively to fi nd out the students achievement level integrated to chemistry skills mastery. The entire score of measurement results in each school is then analyzed descriptively and converted to a category of integrated skills mastery achievement level in the reaction rate material as well as the percentage of students mastery in understanding the material on each competency indicator that is refl ected in each item. The percentage of achievement of integrated skills in each school is diff erent, but if the score is categorized it results in the same achievement category, which is considered as medium category. In addition, Figure 2 outlines the level of achievement of integrated skills for each indicator at each school level.
Overall, it can be seen that in schools with low, medium and high level of learners' abilities the lowest integrated skills achieved in almost the same with competency indicators, which are in competency indicators 1 and 2. Schools, in order, are listed in Figure 2, they are SMAN 6, SMAN 8, SMAN 15, SMAN 10, and SMAN 9. The two lowest indicators are explaining the real phenomena that can be captured by the fi ve senses using the concept of collision theory owned (indicator 1) and explaining the real phenomena that can be captured by the fi ve senses using the concept of reaction rates that are owned (indicator 2). These two indicators are extensions of the collision theory and the reaction rate concept in the reaction rate chapter. These two indicators respectively have an average for each school in general of 65.85% and 69.4%, and considered as medium category. In the fi rst indicator, students are required to be able to analyze the cause of phenomena in daily life using the collision theory that is owned by linking it to the representation image. In the text of the questions presented two diff erent types of collisions and students are asked to analyze the collision that is eff ective and the factors that cause the collision of the two collision illustrations. According to question number one, it is known that most students have errors in answering the factors that aff ect collisions, as shown in the Figure 3. According to Figure 3, it can be seen that the students' answers at points b and c have errors. At point b, the students answer the command about the factors that aff ect collisions with the answers to the factors that aff ect the reaction rate, and show students lack of literacy aspect 1 mastery and aspects of macroscopic representation. Besides that, at point c students are also not quite right in determining the concentration of NO and O 2 after reacting, and this indicates that the students lack of mathematical representation aspect mastery, which is one aspect of representation that is needed to be mastered by students at most chemical material. This is supported by research conducted by Tima and Sutrisno (2018, p. 11) that the study of chemistry must involve macroscopic, submicroscopic and symbolic representations, so when students learn, it will connect the three representations. Fahriyah and Wiyarsi (2017, p. 199) also states that the macroscopic, microscopic, symbolic and mathematical aspects in multiple chemical representations have to be supported by mathematical aspects so that students can understand chemistry more thoroughly. In addition, according to Figure 3a it can be seen that the students have been right in answering the four question commands contained in question number 1. Diff erent from the answers in Figure 3a, students who answered the question items in Figure 3b have correctly answered items b and c, which is indicated by the answers to factors that aff ect collisions that are answered by students is infl uenced by energy, direction and frequency of collisions, but these answers have not been accompanied by a more complete explanation. Correspondingly, the students' answers to item c have Jurnal Kependidikan, 6(1), been answered correctly, that is the increase in NO and O2 concentrations is 2 and 3 times initially. This indicates that the students have mastered the mathematical, macroscopic and literacy aspects 1 which are integrated in the question number 1.
In line with the integration of several aspects of chemical literacy and multiple representations in a question instrument, it is important to conduct the ability to train chemical literacy and multiple representations of students (Odja & Payu, 2014, p. 46), and can increase students' understanding, especially in some suffi cient chemical materials which is diffi cult to understand. Furthermore, the broader of context would allow the students to make a better connection among the chemistry concept to their daily life that brings an opportunity to investigate their own hypothesis perceived by themselves (Wiyarsi, Damanhuri, & Fitriyana, 2020, p. 192). This is in line with the statement of Ainsworth (2004, p. 191), which states that chemistry learning should be taught by linking all levels of multiple representations so that students can comprehend chemistry as at all and chemical learning that has been taken by students in the learning process becomes more meaningful, which certainly can be more easily applied by students in daily life in order to balance the progress of science and technology which is rapidly developing.

CONCLUSION
The importance of training the students' ability of multiple representations and chemical literacy, especially on some chemical materials, requires the existence of a valid and reliable assessment instrument. Integrated assessment instruments to identify the ability of chemical literacy and multiple chemical representations of High School students in class XI IPA have gone through a trial process to expert judgment and a set of instruments to be used in this study have been deemed feasible by all three expert lecturers or expert judgment with various suggestions or improvements. The quality characteristics of the integrated assessment instruments of this study results were reviewed from the construct validity results of the uni-dimensional assumptions, followed by two other assumptions which are local independence and parameter invariance, as conditions for the use of IRT. IRT modeling used in this study is the GPCM with a 2 PL approach. According to the criteria of the lowest and highest limit of the mean-square (MNSQ) outfi t obtained, 5 test items were proven to be in accordance with the GPCM (2-PL) model. In addition, the results of the parameter analysis show that the level of items' diffi culty was classifi ed as good because it is in the range of -2.0 -+2.0 logit and is spread evenly on each item. The diff erence in power overall lies in 0.323 -0.692 indicating that all items have good diff erence because it is > 0.25. The results of the measurement of chemical literacy skills and the multiple chemical representations of class XI high school students using an integrated assessment instrument showed that the students' skills were classifi ed as moderate with a percentage of 69.28%.