Developing a dynamic assessment instrument to assess reasoning skills about bacteria

This study was aimed to develop instruments to assess reasoning skills in the form of dynamic assessment as a formative assessment to improve students’ reasoning skills about bacteria. The cake format and graduated prompting method was used in this study. The instruments were based on the Fact and Proof Diagnostic Test and Structural Communication Grid (SCG) and focused on the correlational reasoning and combinatorial reasoning. The development of the instrument was carried out by compiling test items according to the basic competence of bacteria at the senior high school level, conducting test trials on 93 high school students, and analyzing the results of the instrument trials. Analysis of the test instrument consists of analysis of validity, reliability, level of difficulty, distinguishing features, and effectiveness of the distractor's function. The results showed 53 items from 67 items were valid and equipped with prompts as guiding questions for each item.


INTRODUCTION
Education in the 21st century faces the challenge to teach and train key skills which will be helpful for students in real-life competition. Such skills are creative thinking, critical thinking skills, communication skills, and collaboration skills. Reasoning skills are part of critical thinking skills (Perta et al., 2017). Critical thinking skills require reasoning skills since through good reasoning, students can express opinions and answer teacher questions correctly and use logical reasoning. Reasoning skill is essential to make decision from various options, separate negative and positive situations, establish moral decisions on actions to be selected, deal with problems, etc. A deficient grasp of reasoning impedes the implementation of preconception and new knowledge to construct a unified knowledge and apply the experience and knowledge in new conditions. Thus, students‗ reasoning skills development is important to consider, one of them is through teacher training and exploring students' reasoning skills. Reasoning shows student performance at one time and in their future since it can forecast their concept understanding (Bhat, 2016;Nambikkai & Veliappan, 2016). Students' concept understanding is related to reasoning skills since those with good reasoning skills can understand concepts easily.
Reasoning skills are the ability to manage new information and combine them with the existing information or knowledge, draw logical and precise conclusions, make the decisions, and solve problems (Remigio et al., 2014). Reasoning skills are needed for students so they can receive, process, and explore information and use their prior knowledge to find solutions to problems. Reasoning skills are activities of thinking by producing new statements, building connections between facts or concepts, and making the conclusions to solve problems (Rosita, Assessment is the activity to gather, analyze, and determine the information about students' learning constraints, and their success in achieving basic competencies. It is carried out to determine the quality of the learning process and its results (Amelia et al., 2015). Assessment is useful for gathering, processing, analyzing, describing, and summarizing learning outcomes based on facts in the expected competency mastery process. It can be used as a reference in determining the next learning design. Several types of assessments are commonly used in education, divided into two types: summative assessment or Assessment of Learning (AoL) and formative assessment or Assessment for Learning (AfL). Summative assessment is oriented towards learning outcomes and reflects the achievement of learning that has been done. The formative assessment is oriented towards the learning process and provides information on the constraints experienced during learning and can be a source of information to revise and improve the learning process later on (Ali & Iqbal, 2013;Fornaguera, 2014). The formative assessment facilitates teachers to collect information from various sources such as observations, tests, notes, and classroom discussions to test the validity of the learning instruments, monitor student characteristics and achievements, and design the next step in learning (Clark, 2015). Formative assessment is carried out during the learning process and it aims to evaluate the development of performance and obstacles faced by students, ensure learning objectives are achieved, and make improvements. Weaknesses or obstacles encountered during learning can be used as a guide in improving further learning so learning objectives can be achieved.
Effective assessments must be able to show students' thinking skills and how they change or adapt. An effective assessment can be formed into the diagnostic assessment or the formative assessment. Formative assessment is intended to help the students to train and implement their understanding, concepts, problem-solving skills, attitudes, and social abilities in their daily lives (Mohamed & Lebar, 2017). Teachers can use formative assessment as a tool to measure the extent to which competencies being taught can be mastered, estimate the results of summative assessments, and improve the learning process.
Various researches about AfL have been conducted to improve students' misconceptions and arguments related to their reasoning skills. Most researches use the Facts and Proofs Diagnostic (FPD) test and Structural Communication Grid (SCG) test on three topics: Bacteria, Protist, and Virus. They were aimed to assess students' misconceptions and argumentation skills (Novitasari et al., 2018;Raharjo et al., 2018;Setyaningrum et al., 2018). Misconceptions and argumentation skills can be diagnosed using the FPD and SCG tests because these instruments are familiar to students, avoiding students in guessing answers since students are asked to write their answer choice and reasons, and facilitate students to construct their knowledge.
Dynamic assessment is a formative assessment or AfL. It is oriented towards the learning process. It allows for interaction and assistance so the teacher can know and help the students by providing feedbacks (Hessamy & Ghaderi, 2014). Dynamic assessment is different with static assessment. Dynamic assessment allows interaction and help between teachers and students. There is intervention in dynamic assessment which can be a helper for students to get better performace. The students' thinking process to achieve the correct answer can be observed through dynamic assessment. Mediation or other interventions can be given to students to help them to think according to concepts and achieve correct answers (Cotrus & Stanciu, 2014). Students can achieve correct answers if they can understand and use reasoning well without problems, but students cannot achieve correct answers if they get problems and they need help. Students can get help through mediation in dynamic assessment.
Dynamic assessment emphasizes the relationship between the mediator (teacher) and subject being assisted (students) to investigate various impediments experienced by students, then the teacher can design how to help them achieve their optimum potentials. The dynamic assessment is based on Vygotsky's theory of Zone of Proximal Development (ZPD). It stated the students who cannot complete an assignment need help from adults or more experienced person to complete their tasks and reach their potential abilities. The provision of assistance is in line with implementing dynamic assessment because dynamic assessment facilitates students by providing assistance or interventions in the form of mediation between teachers and students, giving hints, clues, or prompts (Poehner & Infante, 2017). Dynamic assessment becomes a tool to show students' skills, identify their potential levels, and guide them in the thinking process. Assisstance in dynamic assessment enables teachers and students to interact each other and enable teachers to gain more information on students' problems and abilities.
Dynamic assessment enables teachers to interact with students through active feedback during the learning process. Interventions in dynamic assessment are aimed to help students gather and combine information and develop skills and understanding. Feedback can be the same feedback for all students or different feedback for each student. Feedback in dynamic assessment can be given in the form of instructions, clues, hints, or prompts. The dynamic assessment provides more detailed information about the potential level of students than static assessment (Resing, 2013). Dynamic assessment can be used to help student solve problems and help students achieve their potential abilities. Feedback in dynamic assessment should be appropriated with the students' need and the lesson purpose.
Feedback or instruction is a key element to implement dynamic assessment. Feedback or instruction is given according to the assessment's needs and context. Interventions in the dynamic assessment are permitted, while in static assessment, they are not (Khaghaninejad, 2015). The dynamic assessment approach is divided into two types: the sandwich and cake formats. The sandwich format comprised of the pretest-mediation (instruction)-posttest stage.
Interventions are given at the stage between pretest and posttest and are given individually or in groups. The cake format comprised of the intervention or feedback stage given to students during the test (Hessamy & Ghaderi, 2014). The FPD and SCG tests are a static assessment. Novitasari et al. (2018) show an unsatisfactory research result, so the instruments have to be developed into dynamic assessment to assess students' reasoning skills.

RESEARCH METHOD
The instruments were developed and adapted from Novitasari et al. (2018), converted from the static assessment into dynamic assessment. The bacteria was used as the topic choice. The instruments were developed based on the Basic Competency (BC) 3.5: Identifying the structure, life, reproduction, and the roles of bacteria. It was listed as the topic in biology according to the Indonesian National Curriculum of 2013 (Regulation of the Minister of Education and Culture No. 24 of 2016). First, BC 3.5 was break down into several indicators. Those indicators were used as reference to construct the test items, so each item was expected could evaluate students' reasoning skills of bacteria. The items were focused to assess correlational reasoning and combinatorial reasoning skills. The items were arranged as the FPD and SCG tests.
The cake format approach was used as the design for the dynamic assessment developed in this study. The dynamic assessment was applied using prompts when students take the test. Each item of questions was equipped with a prompt. The prompts were given as the directing questions given as feedback to students regarding their answers. The prompts appeared when students gave wrong answer. The prompts helped students to reconstruct their knowledge so they can find the correct answers (Shabani, 2016). The tests consisted of 67 items and were equipped with prompts for each question. The concepts tested in the instrument can be seen in Table 4. The type of instrument validity used is empirical validity, while the construct and content validity has been checked by expert judgment. Trial testing was conducted before the instruments were being used in data collection. The instruments were tested to 93 students from three different state senior high schools.
The trial was aimed to evaluate the instruments' validity, reliability, and items' quality (level of difficulty, distinguishing power, and distracter's effectiveness). The validity test was carried out by comparing the correlation between item score of dynamic assessment instru-ment with item score of standard instrument (final semester assessment). This research uses expert judgement validity. The reliability test was carried out by comparing the value of the trial with the average value of the midterm assessment and final semester assessment, then the reliability was tested using the PEARSON function in Microsoft Excel 2010 software. The instruments were declared valid and reliable and have good items' quality. The research was conducted using the control and experimental classes. The control class got the pre-test and posttest without prompts, while the experimental class got the pre-test and post-test with prompts.

FINDINGS AND DISCUSSION
The instruments were trial-tested to 93 respondents. The data were analyzed to improve and estimate the instruments' validity, reliability, level of difficulty, distinguishing power, and effectiveness of the distracter. The validity test results show the instrument has an r xy value of 0.89. The value is between 0.21-0.40, so the instrument is declared as valid with high-validity level. The test instrument is valid, thus, it can be used to assess students's reasoning skills precisely. The value is between 0.21-1.00. The reliability test results show the instrument has an r xy value of 0.25. The value is between 0.21-0.40, so the instrument is declared as reliable with a low-reliability level. The test instrument is reliable, thus, it can be used continuously or repeatedly and generate consistent results. The instrument is valid if the value of the correlation coefficient (validity range) is between 0.21-1.00. The data are also being used for item analysis which includes the level of difficulty and distinguishing power. The results of item analysis are shown in Table 1, Table 2, and Figure 1.
The item's levels of difficulties based on Arifin (2009) is divided into three levels, those are easy, medium, and hard. Table 1 shows 76.47% of the items have medium difficulty, and 23.53% of them have easy difficulty. Table 2 shows 0% of the items have bad distinguishing power (DP), 15% of them have adequate DP, 30% of them have good DP, and 55% of them have the very good DP. Figure 1 shows 32% of the items have effective distracters, and 68% of them have ineffective distracters. The results of the validity test showed the instruments have been developed to accommodate the specified concept, the bacteria. The instruments have a high-validity value at 0.89. The instrument can be used for improving student's reasoning skills. The instrument's validity can be influenced by several factors, both internal factors, and external factors. Internal factors were the proper duration of time for testing. External factors were the students' actions such as not cheating (Sukardi, 2009). The duration of time given for working on the instrument must be precise so that students can work in a hurry or do it slowly. Cheating will affect the validity of the questions because students will be affected by the answers of their friends and can not show the actual test results.
The results of the reliability test revealed the instruments were reliable, thus it can be used repeatedly at different times and occasions and produce consistent results. The instruments have a low-reliability value at 0.25. It can be influenced by several factors including the number of test questions and the items' levels of difficulty. A large number of questions can diminish the tendency of students to guess the answers. Instruments with the ideal level of difficulty can produce the score distribution which closes to the normal curve (Arifin, 2009). The level of difficulty of the questions affects the characteristics of the distribution of student scores, so the test instruments can be used to evaluate their own quality and evaluate the students' abilities. Table 1 showed the instruments have the easy and medium difficulties. It showed 77% of the questions have medium difficulty and 23% of them have easy difficulty. Easy difficulty can be caused by the answer options were easy to be guessed or the distracter items were ineffective (Sudjiono, 2012). Table 2 shows 0% of the items have poor or bad distinguishing power (DP), 15% of them have adequate DP, 30% of them have good DP, and 55% of them have very good DP. It can be concluded that the instruments can distinguish between the students who did not understand the concept (low ability) and students who have understood the concept (high ability). Figure 1 shows that 32% of the items have effective distracters, and 68% of them have ineffective distracters. The distracters were considered effective if at least 5% of students from the LA group who choose them were greater than the number of the students from the HA group who choose the distracters. A well-functioning distracter can trick students to choosing it as the answers. Distraction items are well functioned if they can outwit students to gave the correct answer, or make students think they are the correct answer choice. The more LA students choose the distracters, the more effective the distracters were (Sudjiono, 2012).
The instruments used to assess students' reasoning skills with a dynamic assessment have been developed. It constructed based on the modified Facts and Proofs Diagnostic Test and Structural Communication Grid Test (SCG) from Novitasari et al. (2018). The instrument comprised of 53 items and with prompts in the form of directing questions to help students achieve the correct answer. Life and reproduction methods of bacteria, and factors that affect their growth 3 3 Basis in classifying bacteria 3 4 Roles of bacteria 2 Table 3 shows the distribution of instruments based on basic competencies about bacteria. Q1, Q2, Q3, Q4, Q5, and also Q6 were arranged in order to find out the students' reasoning skills on the concept of bacteria as organisms. Q7, Q8, and Q9 were arranged to find out the students' reasoning skills on the concept of how bacteria live, how they breed, and the factors that influence their growth. Q10, Q12, and also Q13 were arranged in order to find out students' reasoning skills on the concept of bacterial classification. Q14 and Q15 were arranged in order to find out the students' reasoning skills on the concept of the roles of bacteria.
As in Table 4, each question is arranged based on the BC 3.5 about bacteria. Thus, each question is expected to be able to assess the students' reasoning skills about the bacteria according to the BC 3.5 demands. Modifications of the Facts and Proofs Diagnostic (FPD) test into Dynamic Assessment are shown in Figure 2   1. A bacteria is an organism and not a particle. As living organisms, bacteria have several characters. In your opinion, which character is the main reason for categorizing bacteria as living things? a. Bacteria can multiply because the main characteristic of living things is that they can multiply b. Bacteria can move and move becayse all living things must move to find food c. Bacteria are composed of a cell because cell is the smallest compiler of living things d. Bacteria can produce or obtain food because all living things need food 2 10. Bacteria can be divided into Archaebacteria and Eubacteria. Which of the following statements is correct to prove the characteristics of Archaebacteria? a. Archaebacteria are smaller than eubacteria because they appear earlier than eubacteria b. Archaebacteria have not found pathogenic species because archaebacteria have the ability to adapt in extreme environments and called as ancient bacteria c. Archaebacteria have cell walls that are not peptidoglycan because they are not pathogenic d. Archaebacteria have a simple RNA polymerase, because it is an ancient bacterium 3 12. Bacteria have been identified as very diverse. During this time, experts use certain aspects to classify bacteria. The following is the basis of bacterial classification. Click on the appropriate option to show the relationship between the types of bacteria and the basis of their classification! 15. A bacteria can be used to produce nata de coco with the basic ingredients of coconut water. The following statements which are true related to the characteristics of the bacteria used in the ptoduction of nata de coco are: a. Bacteria can produce biosurfactants, bacteria can degrade hydrocarbons b. Bacteria can produce CO2 and alcohol in the fermentation process, bacteria can break down lactose c. Bacteria can produce cellulose through the fermentation process, bacteria can break down sucrose d. Bacteria can produce carbonic acid, bacteria can convert glucose into alcohol and CO2    The prompts for Figure 5 is as follows.
(1) What is the meaning of vegetative, asexual, and sexual reproduction? (2) What is the meaning of binary fission? What is the meaning of budding? What is the meaning of fragmentation? What is the meaning of transformation? What is the meaning of conjugation? What is the meaning of transduction?
The instruments to assess students' reasoning skills with a dynamic assessment have been developed. It constructed based on the modified Facts and Proofs Diagnostic (FPD) test and Structural Communication Grid (SCG) test from Novitasari et al. (2018). The instrument has been revised into the dynamic assessment. The Fact and Proof Communication Grid Test consisted of questions, claims in the form of multiple choice answers, and warrants in the form of essays to support claims. The students were asked to answer questions using reasoned essays as warrants. The Structural Communication Grid Test (SCG) was that consisted of one question with 10 columns containing 6 correct concepts and 4 wrong concepts. The students were asked to choose the correct concepts. Both tests were revised and developed into the dynamic assessment, thus prompts were given to help students answer the questions correctly.
The dynamic assessments were applied during the pre-test and post-test to assess students' reasoning skills about bacteria. The test instruments were consisted of 53 valid items and conducted in 45-minute period. During the pre-test, both classes were asked to answer the test by using the Google form. Next, the experiment class students were asked to answer the same tests. Students who have answered all questions can then click the view score button to see feedback for each question whether they were answered incorrectly or correctly. If the students gave the correct answers, feedback will be appeared as the statements to reinforce their answers. If they gave the incorrect answers, feedback will be given as the directing questions to guide them to reach the correct answer.
The students would be given same prompts to help them answer each questions correctly. If students gave the proper answer, feedback will appear as the statement -Your answer is correct‖. If they gave the wrong answer, feedback will appear in the form of directing questions about the concept being asked. In the posttest, both classes were asked to answer the post test questions. Experimental class students were provided with prompting, while control class students were not. The pretest and posttest questions were the same items. Pretest results from the experimental class were used to categorize the students' reasoning skills.
The dynamic assessment can be implemented using four different methods. Dynamic assessment can be done using the testing the limits, pretest-train-posttest, clinical interview, or graduated prompting methods (Kovalčíková, 2015). In testing the limits, the dynamic assessment is implemented by modifying standard test procedures. This test instructs the students to answer the test by give the clues, or eliminate the time limit, or help them identify and correct their mistakes. The test results are not in the form of scores but information about students' abilities (Gonzalez et al., 1996).
In the pretest-train-posttest method, the dynamic assessment is implemented using the pretest, then in the next stage the teacher gives an intervention to students during learning, and in the final stage, the teacher gives a posttest to the student. Teachers must not provide interventions during the pretest and posttest (Khaghaninejad, 2015). In the technical interview method, the dynamic assessment is implemented by interviewing the students according to the prepared questions. The results of the interview are not to determine the level or score but only to find out their thoughts in problem-solving (Gunhan, 2014).
DA can be applied using various methods such as pretest-train-and posttest, testing the limits, clinical interview, or graduated prompting (Kovalčíková, 2015). The graduated prompting method was used in this research. In this method, the interventions in DA were carried out by the prompts. Gradual prompts were given to help the students answer the test. Each item was equipped with prompts ranging from the general to the more specific prompts (Wang, 2010). Prompts provided guiding questions to help the students in building and reconstructing their knowledge. Thus, they were expected to give the correct answers. Prompts in dynamic assessment must be effective and can help students to achieve their potential abilities (Navarro & Mourgues-Codern, 2018). Successful prompts can guide students to give the proper answers (Wang, 2010).
Items Q1, Q2, Q3a-j, Q4, Q5a-f, and Q6a-f are accompanied by prompts about the characteristics of bacteria as living things such as cell structure; characteristics of bacteria as prokaryotic organisms; the difference between bacteria, viruses, and protists; and the structure of bacteria and their functions. Those items test students' reasoning and conceptual understanding of the bacteria as organisms. Q7a-1, Q8a-g, and Q9 are accompanied by prompts about the characteristics of hete-rotrophic and autotrophic bacteria; methods of bacterial reproduction; and factors that support bacterial growth. Those questions test students' reasoning and conceptual understanding of the life and reproduction of bacteria. Q10, Q11, Q12a-g, and Q13a-k are accompanied by prompts about the characteristics of Archaebacteria and Eubacteria, and the basis for bacterial classification. Those four questions are to test students' reasoning and conceptual understanding of the basis of bacterial classification. Q14 and Q15 are accompanied by prompts about the roles of bacteria. Both questions test the students' reasoning and understanding of the roles of bacteria. The instruments consisted of 67 questions constructed based on the indicators of correlational reasoning and combinatorial reasoning.
Prompts were expected to help students to give the correct answer, so the effective prompts were needed. Changes in students' answers (such as at the pretest they gave the wrong answers, then at the post-test, they gave the right answers) showed the effectiveness of the prompts. It showed the prompts help students to achieve their potential abilities (Navarro & Mourgues-Codern, 2018). Schworm and Renkl (2007) stated that prompts can improve one of the students' abilities, it is students' argumentation abilities. Prompts given stimulate student learning processes actively and focusing students to understand the core aspects of the material being taught so. Students can be trained and reach their potential ability to declaratively convey their arguments in learning. Potential abilities of students can be explored through meaningful learning using prompts. Prompts given to students will optimize student learning processes and improve student learning outcomes because prompts can be activators of strategies in the learning process (Berthold et al., 2007). Teachers can apply prompts to identify the obstacles experienced by students in learning and help students to overcome these obstacles. Prompts provide many benefits, but it is important to notice that giving prompts must be adjusted to the student's initial knowledge so that prompts function effectively and benefit students.

CONCLUSION
The dynamic assessment instrument to assess reasoning skills has been developed. It was modified from the Fact and Proofs Diagnostic Test and Structural Communication Grid (SCG). The instrument was focused to assess correlational reasoning and combinatorial reasoning. The results showed 53 out of 67 items were considered as valid and equipped with prompts as the guiding questions for each item. The dynamic assessment instrument with prompts was expected to help students to reach their potential abilities and help them to give the correct answers.