Developing psychomotor evaluation instrument of biochemistry practicum for university students of biology education

Practicum is one of the important aspects of the learning of biology. There is no psychomotor evaluation instrument that is valid and reliable. This study is aimed at developing a valid and reliable psychomotor evaluation instrument for biochemistry practicum. The study is developmental research using the 4-D model of ‘define, design, develop, and disseminate’. Instrument validation was carried out through construct validation. The findings show that the developed instrument is characterized by a high level of construct validity although the reliability measure is not very well-estimated. The instrument is constructed of four factors of perception, set, guided response, and mechanism developed into 80 statement items.


Introduction
Science learning (particularly biology) involves practicum. A practicum, in biology learning, is an activity of exploration as well as experimentation in the laboratory or in the open field to give a direct experience to the students. Since science covers three aspects: product, process, and scientific attitudes (Tursinawati, 2016), it is often stated that practicum is an unseparated part of science.
One of the superior aspects of practicum as a learning method is that it gives the chance for students to test, find, and elucidate theories (Suryaningsih, 2017); develop the basic skills of experimentation; endorse enthusiasm for knowledge; elevate problem-solving skills; provide students with facilities of scientific investigation. Practicum activities can also improve the students' scientific processes and concept masteries (Lestari & Diana, 2018;Suardana, Liliasari, & Ismunandar, 2013).
It is unfortunate to say that, thus far, evaluation on practicum activities in the labo-ratory still emphasize on the cognitive aspects, while psychomotor skills evaluation receives small attention (Hamid et al., 2012). This can be seen from the low proportion of the cognitive evaluation for the pre-test, post-test, and final assignment that is usually written. Meanwhile, according to Osman, Hiong, and Vebrianto (2013), in order that students acquire the skills needed for the 21st-century, biology learning must involve a lot of inquiry skills. Inquiry skills include (1) formulating problems, (2) proposing a hypothesis, (3) designing experimentation to test hypotheses, (4) testing data analyses and making conclusions, and (5) writing a report. In addition, students are also expected to be able to operate experiment tools in the laboratory.  map the essential laboratory skills into 14 as follows: (1) observing, (2) calculating, (3) measuring, (4) classifying, (5) finding space/time relation, (6) formulating hypotheses, (7) designing an experiment, (8) controlling variables, (9) interpreting data, (10) making infer-ences, (11) predicting, (12) concluding, (13) applying, and (14) communicating. However, in their study, they find that mastery of the essential laboratory skills of biology teachercandidate students in the ecology practicum is still low.
Likewise, Kasilingam, Ramalingam, and Chinnavan (2014) describe psychomotor skills into seven levels, namely: (1) perception, (2) set, (3) guided response, (4) mechanism, (5) complex overt, (6) adaptation, and (7) origination. Verbs that are related to the perception level include selecting, choosing, isolating, and identifying. Verbs that represent the set level include showing, starting, explaining, etc. In the guided response level, verbs that can be used include imitating, following steps, making, etc. In the mechanism level, verbs that are relevant include calibrating measuring, mixing, organizing, heating, manipulating, etc. Maknun, Surtikanti, Munandar, and Subahar (2012) categorize psychomotor skills of the practicum class in the ecology subject matter as setting up the tools in line with the practicum plan, calibrating and maintaining the laboratory tools, operating pipettes, operating microscopes, taking notes, working safely in accordance with work health and security. The results of their study show that the psychomotor skills of teacher-candidate students in biology practicum are still low.
The conduct of the biochemistry practicum in the study program of Biology Education, Universitas Ahmad Dahlan (UAD) has included cognitive, psychomotor, and affective aspects; however, evaluation of the cognitive aspects dominates the process (80%) while the psychomotor and affective aspects take the rest (20%). Besides, no standard and valid instrument have been developed for the evaluation of the psychomotor aspects of learning in the biochemistry practicum. As a result, evaluation for the practicum has a high measure of subjectivity.
To date, many studies have been conducted on the development of evaluation instruments. One example is the study done by Ridlo (2012), but this study focuses on the knowledge aspects of biology practicum. Instrument development in the psychomotor domain in biology practicum is done, among others, by Yunita, Agung, and Nuraeni (2016) with good validation in the aspects of material, construction, and language. Yulianti, Andriani, and Taufiq (2014) have developed a psychomotor evaluation instrument in the temperature and calorie topic. Another study was done by Hazarianti, Masriani, and Hadi (2016) on a psychomotor evaluation rubric in the practicum of the distribution coefficient sub-material. This rubric, however, is used in the classes other than biology.
Development of psychomotor evaluation instruments has been done so far for high school students; meanwhile, very little has been done for university students. Besides, learning evaluation has so far emphasized the cognitive skills, even for practicum classes which actually need psychomotor skills. It is therefore important that the development of evaluation instruments be developed in biology education, especially in the biochemistry topic. This is due to the fact that biochemistry is one of the basic materials in biology along with physiology, genetics, microbiology, and others.
The 'define' phase includes four steps, namely: (a) initial analysis, (b) curriculum review, (c) content review, and (d) learner analysis. The 'design' phase includes four steps, namely: (a) selection of assessment scales, (b) development of the instrument draft, (c) instrument validation, and (d) test of assistant limitation. The 'develop' phase consists of two steps, namely: (a) product evaluation by experts and (b) small-group and large-group tryouts.
The study used two questionnaires as the instruments for data collection. The first questionnaire, using the Likert scale, consisted of statements concerning the instrument feasibility to be given to material experts and evaluation experts. The second questionnaire was tested for readability to practicum assistants using the Guttman (Yes/No) scale. All instruments were first validated by the material and evaluation experts.
Data were analyzed by a combination of descriptive and qualitative techniques. The Likert scale was scored by 4 to 1 rating to be categorized into very good, good, poor, and very poor. The Guttman scale had 2 ratings with a maximum score of 15. For item validity, exploratory factor analysis (EFA) was used with the four indicators of Kaiser-Meyer-Olkin measure of sampling adequacy (KMO, MSA), Bartlett's test of sphericity, anti-image correlation, and factor loading. As general criteria, if the level of Bartlett's test of sphericity is p<0.5, KMO-MSA value is >0.5, and the anti-image correlation is >0.5, the sample data are feasible for analysis. The quantitative data from the experts and assistants were analyzed for feasibility by categorizing them into four interpretation criteria using the formula proposed by Mardapi (2008).
The research product is regarded as feasible if the results of the analyses are minimally at the category of 'good'. The criteria include content material, construction, language, objectivity, and utility.

Findings
The study is research and development in three phases, namely: (a) define, (b) design, and (c) develop. In the 'define' phase, analyses are conducted in the initial situation, curriculum content, subject material, and learner characteristics. Analyses of the initial situation are done by carrying out discussions with biochemistry practicum coordinators and assistants concerning the running and evaluation of the biochemistry practicum. From this activity, it can be known that the practice of practicum evaluation is still dominant in the cognitive domain, approaching 80% of the whole process. Psychomotor skills aspects take only about 10%.
The curriculum analyses are done on the practicum lesson plans, learning outcomes, and practicum guidebooks. Concerning the learning outcomes, among others, students are able to practice making pH solutions of various concentrations, making buffer solutions, and measuring pH solutions. These abilities in making and measuring pH solutions will become the bases for doing other practicum activities.
Analyses of the content material are directed to look at the basic materials that are given before the practicum class. The content material for pH practicum is an advanced topic. The topic of pH making and measuring are the fifth items in the whole syllabus of the biochemistry practicum. The preceding classes contain practicum activities the accuracy and correctness of experiments. In these preceding practicums, students practice liquefying, measuring, and using the right tools. It is expected that in the fifth practicum, students are readily familiar with the initial and basic steps of experimentation.
Learner analyses are directed to look at the characteristics of the students who take the biochemistry practicum in semester 2. The biochemistry practicum is the first practicum the students have in their program. There is no practicum in semester 1. The practicum uses four of the six levels of the psychomotor domain (Hamid et al., 2012) namely: level 1 (perceiving), level 2 (being ready for active participation), level 3 (integrative responding), and level 4 (showing work performances to become habitual). In the complete scheme, level 5 is complex overt responding and level 6 is adapting. These are not yet included in the items for the learning evaluation.
In the 'design' phase, the following steps were carried out: selecting evaluation scales, developing the instrument draft, validating, and readability testing. Selection of the evaluation scales is done by reviewing the instrument draft design. Initially, the evaluation scales are related to the check-list type with Yes/No responses. According to Ibezim and Igwe (2016), the check-list instrument is more objective in measuring psychomotor skills than rating scales. However, taking the experts and assistants' recommendation, the Likert-type rating scales be used for the developed instrument. It is expected that, by using the Likert scales, differences in the students' performances can be more clearly detected. Three scales will be used: 1 for inadequate, 2 for good, and 3 for very good.
The product draft consists of an instrument for readability and instrument for expert validation. The draft is formatted in the following aspects: (1) title of the experiment, (2) objectives to be achieved, (3) psychomotor evaluation aspects, (4) levels of the psychomotor skills, (5) indicators for the psychomotor skills, (6) descriptors representing the indicators, (7) evaluation scales, (8) evaluation rubrics, and (9) scoring guides.
The experiment title is related to the experiment of making and measuring pH. The learning objective to be achieved is for students to be able to make solutions with various concentrations, making solution buffers, and measuring pH solutions. The aspects that will be observed in the activities consist of preparation for the practicum, running of the practicum, and reporting.
The product instrument was evaluated by validators before it was subjected to the try-outs. This evaluation consists of readability checks by practicum assistants and evaluation instrument by evaluation and subject matter experts.
The instrument validity was obtained from the wider-scale try-out using exploratory factor analysis (EFA). The results of the EFA analyses show that the Kaiser Meyer Olkin Measure of Sampling Adequacy (KMO) is 0.787 which means that the factor analysis can be continued. Looking at the number of factors that have an eigenvalue of more than 1, four levels of the psychomotor domain can be obtained: level 1 for perception, level 2 for set (readiness for active participation), level 3 for guided response (integrative responses), and level 4 for mechanism (showing performance as a habit).
The indicators for the psychomotor skills in the developed instrument cover the following details: being able to set the tools and materials for the experiment, writing up the steps of the work, making HCl solution using various concentrations, measuring the pH of the HCl solution, making 2% NaOH solution, making 100 ml of 0.2M CH3COOna solution, making 1% gelatin solution, making 0.2M NaH2PO4.H2O solution, making 0.2M pH 5 acetate buffer solution, writing out practicum objectives, writing out observation results, comparing observation results with the theory, writing out the discussion of results, making conclusions, and writing up the practicum report. These are presented in Table 1. Each indicator is operationalized into descriptors. There are 80 descriptor statements. The three three-scale Likert criteria are 1 for inadequate, 2 for good, and 3 for very good. The total score made by the students is the sum of all the scores obtained for each indicator. The maximum score is 240 and the minimum score is 80. Students' score can be obtained by the following formula: Student's score = Score gained by student x 100 240 The instrument that has been constructvalidated was subjected to readability checks by the practicum assistants. The results of the readability test show that the instrument readability can be categorized as very good (93.83%). The readability checks include language, ease, objectivity, and utility.
The 'develop' phase consists of three activities, covering: (a) expert evaluation, (b) small-group try-out, and also (c) large-group try-out. Based on the results of the evaluation by the subject-matter and evaluation experts, the instrument is categorized as very good (91.67). The evaluation includes language, construct, content, objectivity, and utility. A minor revision is suggested, however, by the subject-matter experts on the use of vocabulary words and simplification of the descriptors. The final version of the instrument draft ends up with 80 statement items. Some indicators and descriptors of the final draft are presented in Table 2, Table 3, Table 4, and  Table 5.
In Table 2, the indicators and descriptors are those that are used for the perception level. There are two indicators in this level, namely selecting tools and materials and formulating practicum objectives. These indicators are supported by Kasilingam et al. (2014) whereby the level perception can be operationalized by choosing, selecting, describing, etc. Table 3 shows indicators and descriptors for the psychomotor set level. The set level operationalizes into mental, emotional, and physical readiness of the student to work. In this level, the indicators are writing down work procedure and writing up observation results. These indicators are chosen for the reason that students' readiness to do the practicum can be seen from their understanding on the sequence of the steps in the practicum class, which is represented by their ability to write down the steps in accordance with the guidebook. In the same way, students' readiness to communicate the results and write a report is shown by their ability to write down the results of the practicum and any important phenomenon in the form of a report draft.  In the level of guided response, there are 50 items to be tested. These items are written out in accordance with the practicum guidebook (Kasilingam et al., 2014). Some of these instrument items are shown in Table 4.
The mechanism level has skill categories with which students are familiar. It includes calculating solution volume, weighing materials, observing solution volume through the glass tube, heating solution, measuring pH solution, etc. These descriptors use operational mechanisms like measuring, organizing, heating, etc. (Kasilingam et al., 2014).
After being evaluated by experts, the instrument was subjected to a try-out to a small group of 20 students. The results show that the average of students' scores is 128. Converted into the 1 to 100, this score is represented by 53.33. This score belongs to the low category.  The instrument was finally subjected to the bigger-group try-out of 45 students. The results of the try-out show that the average of students' scores is 148.8. Converted into the 1 to 100, this score is represented by 62. This score also belongs to the low category.

Discussions
The research findings show that the developed instrument has a high construct validity; however, results of the small-group and large-group try-outs are not satisfactory. This may be due to the condition that the results of the practicum experiment are shared by the students in the group so that not every student is able to carry out all of the assessment aspects in the practicum.
The results of the try-out to the large group show a score that is interpretable into the low category; the same with those of the small group try-out. This may be caused by the fact that the practicum is carried out by task assignments. This was done because the practicum material is big in volume while the time is limited to two hours. This causes the condition that students are not able to conduct all the activities in the practicum so that the observed psychomotor scores are partial.
The low level of the results of the tryouts may also be caused by the suspicion that the instrument reliability measure is not very well-defined or estimated. According to Lee, Brennan, and Kolen (2000), when the reliability measure is low, the standard error of measurement (SEM) is also low; bringing about the consequence that the validity of the measurement is zero. On the other hand, when the reliability measure is high and the SEM is low, it means that there is validity in the results of the measurement. In spite of all that, the height of the reliability measures (regardless of the sizes) does not guarantee the presence of validity (Azwar, 2008). Consequently, it is true that the conduct of reliability estimation is important in instrument development.

Conclusion
Based on the research findings, it can be concluded that the developed instrument is feasible to be used. The instrument has a high measure of construct validity although its reliability is not very well-estimated. In fact, instrument reliability can be elevated in two ways, i.e. by increasing items that have high internal consistency or reducing those with low internal consistency. The instrument is constructed of four psychomotor aspects of perception, set, guided responses, and mechanism distributed into 80 statement items.

Suggestions
The developed psychomotor evaluation instrument has not been estimated very well in terms of its reliability. It is suggested that other studies intended to develop an evaluation instrument carry out reliability estimation. The techniques can be suited to the objectives and types of data of the study.