Document Type : Research Article
Author
Assistant Professor, Center of English Language, Isfahan University of Technology, Isfahan, Iran
Abstract
Keywords
Main Subjects
Background
Standardized multiple-choice (MC) tests have the potential to provide sufficient stimuli to unleash human capabilities and reduce unfair educational decisions (Armstrong, 1993; Ennis, 1993; Haataja et al., 2023; Lau et al., 2011; Phelps, 2009; Zaidi et al, 2018). Hence, despite a lack of consensus about the most efficient strategies to answer discrete-point items and achieve valid results from knowledge assessment, MC tests still serve as efficient means in large-scale language testing (Haataja et al., 2023; Sellah et al., 2018). However, it is still uncertain whether test-taking strategies are equally effective as the depth and breadth of content knowledge in achieving success on traditional and online language tests (Aryadoust, 2019; Collier, Pillai, & Fazio, 2023).
MC tests may create the conditions for using answer-changing strategies and engaging in cognitive activities. In response to multiple-choice items, test takers may show various dispositions. They may have full knowledge of the assessed content and mark the correct option almost certainly. When there is a knowledge deficit, they may omit the options, merge their disparate estimates, or follow their hunches (Herzog & Hertwig, 2014). The latter test-taking strategies have prompted studies on answer-changing behaviors.
Previous studies have yielded conflicting findings on whether answer-switching behaviors contribute to score gains on MC tests. A significant number of studies have given credit to the contribution of double thoughts to better test scores in traditional pen-paper (e.g., Merry et al., 2021; Stylianou-Georgiou & Papanastasiou, 2017) and computer-based (e.g., Liu et al., 2015; Vispoel, 1998) tests. However, considering the dynamic and context-dependent nature of required strategic choices in achieving and showcasing language skills, it seems unrealistic to introduce a uniform policy for everyone (Chen et al., 2014; Morrison, 1988). Hence, language testing and assessment scholars have started focusing on how test-takers’ increased awareness of their unique cognitive abilities (e.g., cognitive styles) impacts their performance across various assessment platforms (Couchman et al., 2016; Herzog & Hertwig, 2014; Stylianou-Georgiou & Papanastasiou, 2017).
As a reflection of Gestalt and Piagetian theories of cognitive development, cognitive styles embody a set of mental operations such as decision-making, reasoning, judgment, and problem-solving (Pitcher, 2002; Riding & Rayner, 1998), each regulating reception of stimuli and transformation of knowledge into quick solutions to challenging tasks (Zhang, 2023). Awareness of the potential workings of cognitive styles on test-taking performance can direct stakeholders to make appropriate instructional decisions, pursue suitable assessment policies (Griffiths, 2012), and predict the chances for success in cognitive and learning tasks (Parry, 1984). Also, strong familiarity with various mental operations and preferences of language test takers can maintain harmony between instructional and assessment practices and maximize their effectiveness (Cohen & Weaver, 2006; Griffiths, 2012; Zhang, 2023). By implication, test takers and stakeholders can align their time and efforts with the difficulty levels of tasks (Efklides, 2012).
Accordingly, a dearth of studies on answer changing in online testing platforms, conflicting views on the possible effects of answer-switching, and individual differences in score gains/losses in pen-and-paper tests call for further investigations into strategies for taking MC tests (Dodeen, 2008; Peng et al., 2014; Geiger, 1991; Papanastasiou & Reckase, 2008). To address some of these gaps, this study investigates the predictive power of cognitive styles in determining EFL learners’ answer changes on online and traditional forms of teacher-developed English achievement tests. In so doing, the following research questions are examined:
Method
Participants
Based on availability sampling to recruit readily accessible volunteers, this study employed a convenient sample of 50 male and 68 female university students with upper intermediate levels of general English proficiency and an age range of 19 to 25 who pursued diverse engineering disciplines. The participants had not pursued any additional English education and qualifications outside the English courses offered in the mainstream school system before university. They reported mild or almost no anxiety before and during the test and had spent 2.5 weeks on average preparing for the test.
Instruments
The quick Oxford Placement Test (OPT) was the first instrument to check and control for the participants' proficiency levels. Already identified as a valid test (Geranpayeh, 2003) and with a high Kuder-Richardson 21 reliability index of .87 in the present study, OPT has proved successful in controlling the effects of proficiency levels on other variables (e.g., Abdulhussien, 2023; Ashraf Nia et al., 2023; Azizi & Nemati, 2022). The first 40 OPT items are appropriate for learners with different proficiency levels, but the last 20 questions suit learners with upper intermediate and advanced proficiency levels. The study sample comprised participants with OPT scores between 40 and 47 (i.e., B2 or upper intermediate level).
Secondly, a validated teacher-developed English achievement test was applied to detect the tendencies to change answers on MC items. The initial version of the test consisted of 60 items (i.e., 15 grammar and 45 reading comprehension items) based on the language contents taught during a 4-month academic semester. Four university professors with 5 to 20 years of experience in Teaching English as a Foreign Language commented on the content and face validity of the 60-item test. Several rounds of factor analyses using SPSS 22 helped satisfy the construct validity of the items. Accordingly, with 15 items left out, the validated test consisted of 45 items loading on the underlying components (i.e., 30 reading comprehension and 15 grammar items). A satisfactory index of .84 also helped ensure the internal consistency of the 45-item test (Pallant, 2005).
The third instrument was the Persian version of the cognitive style questionnaire (Ehrman & Leaver, 2003) validated by Maftoon and Rezaie (2013) for the Iranian context. The questionnaire consisted of 30 nine-point items targeting field-dependent/field-independent (items 1, 11, and 21), sharpener/leveler (items 3, 13, and 23), impulsive/reflective (items 5, 15, and 25), global/particular (items 4, 14, and 24), analog/digital (items 7, 17, and 27), concrete/abstract (items 8, 18, and 28), field-sensitive/field-insensitive (2, 12, and 22), synthetic/analytic (items 6, 16, and 26), random/sequential (items 9, 19, and 29), and inductive/deductive (items 10, 20, and 30) styles.
Procedures
Administration of Oxford Quick Placement Test, which took about 30 minutes, helped identify 118 EFL learners with upper-intermediate levels of English proficiency as the study participants. The participants also needed to take the teacher-developed English achievement test on grammar and reading comprehension. The next phase of the study involved checking fully filled-out answer sheets with no missing items coupled with the recorded think-aloud videos of the test takers and collecting the overall and separate records of answer changes in each test section. As there were no penalties for incorrect responses and the participants could turn to their prior knowledge in cases of uncertainty, the number of missing items was low, but probably with a high likelihood of blind guessing. However, even the participants’ blind guesses could not seriously damage the collected data since they could not reconsider their choices without content knowledge or mental operations (Geiger, 1991).
The collected data comprised right-to-wrong and wrong-to-right answer changes based on the observed erasure behaviors. To reduce the likelihood of mental answer changes, the course instructor provided general guidelines asking the pen-and-paper test takers to make their changes observable through erasure. Each test taker was supposed to report their changes by recording their choices with a fountain pen and crossing out the previously selected options in case of further changes. Training the students through the course sessions on marking their mental answer changes could remedy the methodological deficiencies of the previous studies due to their complete reliance on ex post facto analyses of erasure behaviors in answer sheets. Besides, checking the reliability index and performing several rounds of factor analysis to confirm item appropriateness could reduce the chances of extremely easy or difficult test items; hence, the frequencies of the students’ back-and-forth work in marking options could be considered reasonable. The traditional test takers were also allowed a time limit for transferring the responses to the answer sheets. For the online version of the same test, the participants installed screen recorder software and thought out loud while taking the test. Then, they shared the recorded files for further analysis.
Before proceeding with the main study, four experts checked the questionnaire items and helped make minor modifications to their formatting. Then, to ensure the appropriateness of the scale for the study purposes, decide on the average questionnaire completion time, and identify and resolve the potential issues, a pilot analysis involving a representative sample of 33 participants from the target population completed the questionnaire. The results indicated that discarding or modifying none of the items could increase the reliability index of the scale (Dornyei, 2010), and on average, each participant needed about 12 minutes to fill out the questionnaire. The Cronbach’s alpha reliability for the internal consistency of the questionnaire items equaled .81, which was highly satisfactory (Pallant, 2005).
During the data analysis phase, Fisher’s exact tests compared answer-switching across various cognitive styles. Subsequently, multiple linear regression analyses gave the required indices to check the predictive power of cognitive learning styles in determining the types and frequencies of answer-changing behaviors.
Results
Answer Changing in the Online Test
Fisher’s Exact Test helped compare the frequencies of total, wrong-to-right, and right-to-wrong changes across online test takers with different cognitive styles (Table 1). The results indicated that reflective and digital test-takers tended to change their answers more frequently than their impulsive and analog counterparts.
Table 1. Comparison of Answer Changing across Different Cognitive Styles in the Online Test
Styles |
N |
Total changes |
Changes to right |
Changes to wrong |
||||||
F |
Fisher’s Exact Test |
F |
Fisher’s Exact Test |
F |
Fisher’s Exact Test |
|||||
Value |
Sig.* |
Value |
Sig.* |
Value |
Sig.* |
|||||
Field-dependent |
35 |
152 |
8.87 |
.56 |
91 |
5.72 |
.76 |
61 |
11.01 |
.06 |
Field- independent |
24 |
119 |
58 |
61 |
||||||
Leveler |
29 |
122 |
10.47 |
.36 |
73 |
9.87 |
.21 |
49 |
8.61 |
.16 |
Sharpener |
30 |
149 |
76 |
73 |
||||||
Impulsive |
28 |
124 |
15.70 |
.05 |
66 |
7.12 |
.55 |
58 |
6.87 |
.31 |
Reflective |
31 |
147 |
83 |
64 |
||||||
Global |
14 |
67 |
4.43 |
.98 |
32 |
9.44 |
.24 |
35 |
5.77 |
.45 |
Particular |
45 |
204 |
117 |
87 |
||||||
Analog |
30 |
123 |
15.88 |
.05 |
69 |
7.35 |
.52 |
54 |
5.66 |
.48 |
Digital |
29 |
148 |
80 |
68 |
||||||
Concrete |
39 |
181 |
9.89 |
.43 |
101 |
5.72 |
.73 |
80 |
5.08 |
.55 |
Abstract |
20 |
90 |
48 |
42 |
||||||
Field- sensitive |
38 |
174 |
9.51 |
.48 |
94 |
7.71 |
.45 |
80 |
8.64 |
.16 |
Field-insensitive |
21 |
97 |
55 |
42 |
||||||
Synthetic |
13 |
57 |
10.35 |
.36 |
25 |
5.52 |
.72 |
32 |
5.25 |
.52 |
Analytic |
46 |
214 |
124 |
90 |
||||||
Random |
33 |
142 |
11.06 |
.30 |
80 |
6.48 |
.64 |
62 |
8.16 |
.20 |
Sequential |
26 |
129 |
69 |
60 |
||||||
Inductive |
33 |
139 |
12.28 |
.20 |
76 |
8.68 |
.33 |
63 |
7.95 |
.21 |
Deductive |
26 |
132 |
73 |
59 |
*: 2-sided; N: number of participants; F= frequencies of changes
The Predictive Power of Cognitive Styles on Answer Changing in the Online Test
Multiple linear regression analyses assisted in specifying if cognitive styles could predict the answer-changing patterns in the online test. As the ANOVA table suggests, the F values for none of the changes were significant (sig.= .00>.05), indicating the inability of the regression model to explain test takers’ answer-changing practices.
Table 2. The Fitness of the Regression Model for the Online Test Data
Models |
Sum of squares |
df |
Mean square |
F |
Sig. |
|
Regression |
Change to wrong |
27.732 |
10 |
2.773 |
1.331 |
.242 |
Change to right |
16.342 |
10 |
1.634 |
.455 |
.910 |
|
Total |
29.113 |
10 |
2.911 |
.548 |
.847 |
|
Residual |
Change to wrong |
99.997 |
48 |
2.083 |
|
|
Change to right |
172.370 |
48 |
3.591 |
|
|
|
Total |
255.124 |
48 |
5.315 |
|
|
|
Total |
Change to wrong |
127.729 |
58 |
|
|
|
Change to right |
188.712 |
58 |
|
|
|
|
Total |
284.237 |
58 |
|
|
|
Answer Changing in the Traditional Test
The Fisher’s Exact Test analyses of answer-switching behaviors in the traditional pen-and-paper test suggested significantly more frequent total answer changing by field-dependent, leveler, and impulsive test takers, significantly more frequent wrong-to-right changes by field-dependent and impulsive test takers, and significantly more frequent right-to-wrong changes by impulsive, analog and concrete test takers compared with their counterparts who stood at the opposite ends of cognitive style continua (Table 3).
Table 3. Comparison of Answer Changing across Different Cognitive Styles in the Traditional Test
Styles |
N |
Total changes |
Changes to right |
Changes to wrong |
||||||
F |
Fisher’s Exact Test |
F |
Fisher’s Exact Test |
F |
Fisher’s Exact Test |
|||||
Value |
Sig.* |
Value |
Sig.* |
Value |
Sig.* |
|||||
Field-dependent |
34 |
242 |
25 |
.01 |
189 |
20.81 |
.02 |
53 |
7.83 |
.19 |
Field- independent |
25 |
78 |
52 |
26 |
||||||
Leveler |
27 |
169 |
20.80 |
.04 |
128 |
15.05 |
.23 |
41 |
4.04 |
.75 |
Sharpener |
32 |
151 |
113 |
38 |
||||||
Impulsive |
29 |
259 |
52.55 |
.00 |
190 |
36.26 |
.00 |
69 |
38.76 |
.00 |
Reflective |
30 |
61 |
51 |
10 |
||||||
Global |
19 |
108 |
12.74 |
.53 |
87 |
11.59 |
.57 |
21 |
4.40 |
.69 |
Particular |
40 |
212 |
154 |
58 |
||||||
Analog |
29 |
198 |
17.91 |
.14 |
143 |
17.59 |
.09 |
55 |
11.94 |
.03 |
Digital |
30 |
122 |
98 |
24 |
||||||
Concrete |
30 |
174 |
15.95 |
.26 |
132 |
11.33 |
.63 |
42 |
12.77 |
.02 |
Abstract |
29 |
146 |
109 |
37 |
||||||
Field- sensitive |
38 |
214 |
19.22 |
.07 |
164 |
17.29 |
.09 |
50 |
6.38 |
.35 |
Field-insensitive |
21 |
106 |
77 |
29 |
||||||
Synthetic |
30 |
132 |
15.50 |
.31 |
101 |
9.40 |
.84 |
31 |
6.83 |
.29 |
Analytic |
29 |
188 |
140 |
48 |
||||||
Random |
26 |
127 |
8.17 |
.96 |
90 |
11.29 |
.62 |
37 |
6.86 |
.30 |
Sequential |
33 |
193 |
151 |
42 |
||||||
Inductive |
31 |
193 |
16.66 |
.21 |
147 |
14.45 |
.28 |
46 |
3.11 |
.91 |
Deductive |
28 |
127 |
94 |
33 |
*: 2-sided; N: number of participants; F= frequencies of changes
The Predictive Power of Cognitive Styles on Answer Changing in the Traditional Test
The ANOVA table from multiple linear regression analyses suggested significant F values for wrong-to-right, right-to-wrong, and total changes (sig.= .00<.05), indicating the fitness of the regression model for the data and 95% ability of the model to explain all the answer changing practices (Table 4).
Table 4. The Fitness of the Regression Model for the Traditional Test Data
Models |
Sum of squares |
df |
Mean square |
F |
Sig. |
|
Regression |
Change to wrong |
77.56 |
10 |
7.76 |
5.67 |
.00 |
Change to right |
391.78 |
10 |
39.18 |
5.94 |
.00 |
|
Total |
752.39 |
10 |
75.24 |
15.43 |
.00 |
|
Residual |
Change to wrong |
65.67 |
48 |
1.368 |
|
|
Change to right |
316.80 |
48 |
6.60 |
|
|
|
Total |
234.02 |
48 |
4.88 |
|
|
|
Total |
Change to wrong |
143.22 |
58 |
|
|
|
Change to right |
708.58 |
58 |
|
|
|
|
Total |
986.41 |
58 |
|
|
|
As the independent variables (i.e., cognitive styles) did not correlate strongly and all Tolerance, as well as VIF values, were respectively above .10 and below ten, the possibility of co-linearity was ruled out. Ultimately, as none of the independent variables could change the prediction power of the model, all of them were retained. According to the significance and Beta values (Appendix A, B, & C), impulsive/reflective styles could predict right-to-wrong (76% of the variance) and wrong-to-right (58% of the variance), as well as total changes (78% of the variance), and inductive/deductive test takers, could significantly predict total answer changes (18% of the variance) (sig <.05). Altogether, the regression model of the study explained 54% of the right-to-wrong, 55% of wrong-to-right, and 76% of the total answer changing variance (Table 5).
Table 5. Regression Model Summary for the Traditional Test
Models |
R |
R2 |
Adjusted R2 |
Std. error of the estimate |
Change Statistics |
||||
R2 change |
F change |
Df1 |
Df2 |
Sig. F change |
|||||
R/W |
.74 |
.54 |
.45 |
1.17 |
.54 |
5.67 |
10 |
48 |
.00 |
W/R |
.74 |
.55 |
.46 |
2.57 |
.55 |
5.94 |
10 |
48 |
.00 |
Total |
.87 |
.76 |
.71 |
2.21 |
.76 |
15.43 |
10 |
48 |
.00 |
Predictors (constant): style 1, style 2, style 3, style 4, style 5, style 6, style 7, style 8, style 9, style 10
Dependent variable: right-to-wrong (model 1); wrong-to-right (model 2); and total (model 3) answer changes
Comparison of Answer Changing in Online and Traditional Tests
Through the final phase of the analysis, the test takers’ answer-changing behaviors across online and traditional pen-paper platforms were also investigated. The results indicated that traditional test takers were more likely to change their responses. However, right-to-wrong changes by online test takers were significantly more frequent.
Table 6. Comparison of Answer Changing across Online and Traditional Tests
Test Formats |
N |
Total changes |
Changes to right |
Changes to wrong |
||||||
F |
Fisher’s Exact Test |
F |
Fisher’s Exact Test |
F |
Fisher’s Exact Test |
|||||
Value |
Sig. |
Value |
Sig. |
Value |
Sig. |
|||||
Online |
59 |
271 |
28 |
.01 |
149 |
14.54 |
.28 |
122 |
16.23 |
.01 |
Traditional |
59 |
320 |
241 |
79 |
*: 2-sided; N: number of participants; F= frequencies of changes
Discussion
The descriptive and regression results suggested that, unlike the online platform, thinking styles could explain the participants’ answer changes in the traditional pen-and-paper test. Impulsive-reflective modes of thinking could predict over half of the wrong-to-right, right-to-wrong, and total answer-changing practices in the pen-and-paper platform. With a much lower percentage, the inductive-deductive pair also proved beneficial in predicting the overall answer switches.
Answer Changing in the Online Test
In the online version of the test, reflective and digital test takers outperformed their impulsive and analog counterparts in their total answer-switching practices. Also, cognitive styles could not predict online answer changes. Since individuals with a reflective cognitive style tend to be more analytic, hesitant, and accurate than their impulsive counterparts (Estaji & Safari, 2023; Rosencwajg & Corroyer, 2005), the overall changes confounded the general expectations. However, as reflective learners may perform better on analytic items and face challenges with global ones (Zelniker & Jeffrey, 1976), a possible justification can be the greater frequency of reading comprehension items covering two-thirds of the total questions and being twice as frequent as the vocabulary items.
Also, because of no significant differences between the wrong-to-right changes (i.e., score gains) made by reflective and impulsive individuals, the implication is that back-and-forth movements by reflective learners between analytic items have made up for the score loss on global ones. Another factor worth considering was the online test duration that, although similar to the traditional test, due to the possibility of unstable internet connections or inconsistent internet speed, could constrain reflective individuals who mainly need more time latency (Davoudi & Heydarnejad, 2020; Estaji & Safari, 2023).
In line with the expectations, digital individuals, who, by definition, were less reflective, tended not to make inferences about the given information, mainly focused on the surface forms, and changed their responses more frequently (Ehrman & Leaver, 2003). Similar behaviors of digital and reflective test takers with rather opposite thinking modes indicate that regardless of their dispositions to make accurate guesses about the correct item responses, test takers seem not to benefit from answer changes in online platforms.
Comparison of Answer Changing in Online and Traditional Tests
Disregarding the cognitive style differences, the significantly more frequent right-to-wrong changes in the online exam reflect the online test takers’ unwise decisions, thus suggesting that answer changing cannot necessarily do the test takers any good if they take online tests. Therefore, it would be inadvisable to change answers except for the cases with misunderstood or misinterpreted item stems (Aryadoust, 2019; Ramsey, Ramsey et al., 1987). However, in the traditional test, total and wrong-to-right answer switching was significantly more frequent, pointing to the already proposed claims stressing the beneficial effects of answer changing. Aligned with this finding, Geiger (1991) argued that second thoughts on item responses also have wrong-to-wrong and right-to-wrong manifestations, but the gains from wrong-to-right changes can make up for the lost points.
Answer Changing in the Traditional Pen-and-Paper Test
In the traditional platform, field-dependent, leveler, and impulsive learners tended to make more total changes. field-dependent and impulsive learners made more wrong-to-right changes, and those with analog, impulsive, and concrete cognitive styles made more right-to-wrong changes. The field-dependent learners’ frequent wrong-to-right changes supported previous notes on teacher-developed achievement tests and more or less reflected the overreliance of field-dependent test-takers on contextual cues (Richards et al., 1992). Moreover, the findings indicating no significant differences in answer-changing behaviors of field-sensitive and field-insensitive test-takers could, unlike the general assumptions, give credit for consideration and treatment of field-insensitivity and field independence as distinct but interrelated thinking styles, thereby signifying the combined effects of field-dependence and insensitivity on this group’s frequently produced changes (Ehrman & Leaver, 2003). The results, however, could not support previous theoretical arguments on the maximum performance of learners with strong field-independent and field-sensitive tendencies (e.g., Angeli, 2013; Davis & Frank, 1979; Ehrman & Leaver, 2003).
One reason for the differences between impulsive and reflective test takers in answer-changing behaviors may be the different mechanisms and reasons they have followed to revisit their responses (Zhang, 2023). Previous studies have shown that misreading the questions and reconceptualizing item stems and the requested information were two main reasons for answer-changing practices (Kruger et al., 2005; Schwartz et al., 1991), an argument quite representative of impulsive individuals. Because of their quick reading habits, impulsive test takers may have needed to reread the item stems, especially the tricky ones, to ensure they have extracted the requested information, thereby increasing the probability of changing their responses. However, reflective learners' successful changes could have been due to the positive effects of their delayed decisions on the accuracy of their choices (Koriat, 2012). A justification for the right-to-wrong changes of this second group can be the possible effects of the time they lost retrieving some further information. Accordingly, being reflective does not guarantee that a test taker always makes successful judgments because other factors, such as item difficulty parameters, can affect the results. For example, in the case of difficult questions, it is highly likely that even by reconsideration and reconceptualization of the items, test takers, regardless of their amount of care and attention, make incorrect decisions (Efklides, 2012).
In this study, wrong-to-right changes were compatible with the digital mode of thinking. Given that the consulted texts in the class and exam sessions did not resource any literary content, getting the gist of the information did not require any creative strategies. Surface strategies such as memorizing grammatical points, literal meanings, and established reading techniques could serve as assets to find the correct answers. This situation was probably the most satisfactory for digital individuals with the tendency for part-to-whole analyses of reading passages, sequential and logical approaches, and shallow surface structures (Ehrman & Leaver, 2003).
A common feature of field-dependent, impulsive, digital, and abstract individuals is that they are likely to process the content holistically (Ehrman & Leaver, 2003; Rozencwajg & Corroyer, 2005). The results pointing to the more frequent incorrect changes in analog and concrete individuals, as opposed to digital and abstract ones, together with more successful changes on the part of field-dependent people, may support the conclusion that wholistic learners and those who are after the gist of issues can act more successfully in traditional answer changing attempts (Richards et al., 1992). Literature also supports that the increased frequency of changes increases the chances for compensating score loss due to wrong choices (Geiger, 1991). However, the present results do not support previous theoretical arguments (e.g., Angeli, 2013; Davis & Frank, 1979; Ehrman & Leaver, 2003) on the maximum performance of learners with strong field-independence tendencies.
Some studies on pen and paper tests suggested that greater reliance on first hunches on MC items is generally closer to the correct answers (Pressley et al., 1990). However, this study showed that the initial choices are not necessarily acceptable in all situations (Pressley & Ghatala, 1988). At odds with the online platform, the traditional test practices indicated a positive association between increased accuracy and one’s attempts to monitor the responses (Efklides, 2012). Hence, a wise policy to decide whether to change their initial responses or keep to them is avoiding both hypercorrections and overconfidence (Efklides, 2012; Metcalfe & Finn, 2012). Herzog and Hertwig (2014) believed that in case of contradictory estimates, test takers should resort to dialectical bootstrapping to take advantage of averaging their rough guesses about the possible correct responses to an item (Herzog & Hertwig, 2014). Based on this conceptualization, answer changing can be beneficial because the test takers think over their choices and reconsider them to find logical and convincing reasons underlying the choice of options, and each time, they can analyze the item from a different perspective, hence a reasonable time delay, activation of the already stored knowledge, and the lower possibility of errors (Vul & Pashler, 2008).
Conclusion
The present findings indicated that test takers benefit from reconsidering their choices upon doubt on traditional MC tests, thus implicitly showing a close relationship between test scores and the mere number of answer changes (e.g., Reiling & Taylor, 1972). However, given the poor answer-switching endeavors on the online platform, the findings can allow the test developers to consider the intervening variables (e.g., exam duration) involved in online exams (Vul & Pashler, 2008). Also, due to the possible adoption of unethical test-tasking strategies such as cheating, putting any interpretations on any sets of the obtained results requires caution (Ellenburg, 1973). Hence, further studies can, on the one hand, reduce the possibility of examination offenses due to an extended exam time and incorrect conceptualizations due to limited exam time on the other. Altogether, as long as answer changes are not affected by educated or informed guesses, they are less likely to improve exam results and thus are not recommended.
The results of this study can raise test takers’ awareness of the test-taking strategies that suit their cognitive styles to perform more successfully in online and traditional tests. The results can further help test developers resolve the issues (e.g., individual traits) regulating test performance. They can utilize the present findings to ensure fair assessment of language skills and create more standardized tests. The instructors can also use the results to offer wiser advice to test takers and shift their mere focus away from task characteristics toward learner-and-task features.
Future studies can address answer changes in light of the regulatory effects of other psychological factors, such as self-awareness and self-confidence. Previous studies have shown that while revising their choices, people tend to rely more on themselves and take advice that agrees with their initial decision (Bonaccio & Dalai, 2006; Herzog & Hertwig, 2014; Soil & Larrick, 2009; Yaniv & Milyavsky, 2007). Hence, it cannot be easy to advise the students to reconsider their choices during the exam. Other issues worth investigating are whether giving general guidelines before the exam or item-specific advice during the exam can work better and if the characteristics of feedback providers (e.g., self-confidence) affect answer changing and MC test scores. Given that low achievers are highly likely to experience lower levels of self-awareness, overconfidence, and biased responses and tend not to change their options upon further reconsideration (Kruger & Dunning, 1999; Stylianou-Georgiou & Papanastasiou, 2017), they can be a potential target for future analyses. It should also be noted that gender was not included as a variable in this study. Therefore, it is recommended that other researchers take it into account in their future investigations.
Acknowledgment
I would like to express my special thanks to the editorial board and anonymous reviewers of the Applied Research on English Language journal.