Test Security in the “Good Old Days”
I fairly often hear people talk lovingly about the “good old days,” when children respected their parents, when we had more faith in government, and when things were simpler, less technology-oriented, and less expensive. I have been around long enough to have lived through some of those days and they seem to be over-rated, fogged over by the mists of time.
But what about the Good Old Day of Test Security? How good were they in practice? I look at this in three ways:
- Willingness to follow the rules
- Stakes associated with testing
- Tools available to help ensure the fairness and validity of test scores
Willingness to Follow the Rules
I took the SAT in the mid-1950’s and I never even heard about attempts to cheat on that test, the ACT Assessment, or other tests my friends were taking such as the GED (High School Equivalency Exam). There has been a steady and consistent increase in cheating in the last several decades, a finding substantiated in a number of studies. Students cheat more as they move up through the school grades and each year is a little worse than the year before. This behavior in the testing domain parallels and reflects a deterioration in public morality. The reason you should not cheat is because you might get caught, not because it is morally wrong. Basically the rule is too often “Do anything you can to get ahead. You owe it to yourself.” To me that is flat out disgusting, but that is what we too often face in testing environments. Yet honor is not lost completely even now. If you remind test takers just before taking a test that their answers should be based only on their own knowledge, attempts to cheat go down. If it is very clear that testing is being closely monitored and that rule breaking has serious consequences, fairness is greatly facilitated.
Stakes Associated with Testing
The task of preventing cheating and test theft has become much more challenging as the stakes associated with testing have gone up. Going back to my experience as an SAT taker thinking I might want to go to college, it was not that difficult for a fairly decent student to meet the testing requirements of even quite selective colleges. The number of applicants was not at the avalanche status that has developed at the most eagerly sought after schools. Also because the College Board wanted to avoid as much as possible students obsessing about their SAT scores, the scores were not reported to students. So I did not see my own SAT scores from a 1955 administration until about ten years later when I did a summer graduate student internship at ETS. I found that the scores had become available to test takers a few year earlier.
In the world of state assessments, it used to be very difficult to get media to pay any attention to the results of the state assessments. Except in special cases such as the New York State Regents exams, state assessments were mostly ignored. Significant consequences have been incorporated into state testing programs, for students in the form of grade promotion or high school graduation requirements. For teachers and school administrators, evaluations of individuals, schools, and programs are now routinely tied to students’ performance on tests. In some instances both monetary incentives and negative outcomes can be triggered by student results. Teachers have a very praiseworthy record of following testing rules and focusing on helping their students learn even when the amount and nature of testing seems inappropriate to them. Resistance and resentment, however, has grown every time stakes have been raised.
As has been the case throughout the 50+ years that I have worked in testing, prevention steps provide the greatest return in the form of minimizing attempts to undermine fairness in testing through cheating and test theft. So careful attention to training, clear and explicit standards, use of monitoring in testing settings, and similar best practices need to be part of any high stakes testing program.
In addition to having solid prevention activities, since the earliest days of high stakes testing, programs have addressed the issue of detecting misbehavior on tests. The method most frequently written about and most widely employed for decades in US high stakes testing has been the analysis of gain scores. Testing programs routinely evaluate the outcomes of multiple test taking. Such analyses can reveal score improvements from one occasion to another that are far beyond what is experienced by the overwhelming majority of students and schools. Where such extreme score gains are observed closer examination frequently takes place. Are there unusual numbers of erasures on a paper and pencil test? Do erasures almost always change a wrong answer to a correct one? Are there extreme similarities in responses among pairs or groups of test takers, not only missing the same questions but choosing the same wrong answer time after time?
When it comes to availability of “tools” for the statistical detection of testing misbehavior, we are in much better shape now than we were in my early years in testing. There are active research programs developing and refining data forensics methods. Some state assessments and some certification and admissions related testing programs routinely evaluate the results of each test administration for evidence of significant testing irregularities. Testing programs are increasingly willing to take action when testing misbehavior is confirmed.
Even though the incidence of attempts to cheat or to steal tests is increasing, we have proven analytic tools at our disposal and many programs have shown their willingness to use them.
I would welcome hearing from anyone who has reactions to my analysis in this Blog. Do you see things as I do or do you have a different perspective on how things compare to “the good old days?”