The case of the waylaid answer key

Recently there have been many reports of lost databases, stolen computers, and misplaced documents. Is it any wonder that tests and exams are also experiencing the same problems? For example, last November in New Zealand the home of an employee of the Qualification Authority was burglarized and a laptop containing math items for the National Certificate of Educational Achievement was stolen. Despite assurances of password protection, the Qualification Authority revised and reprinted 150,000 test booklets: http://www.stuff.co.nz/stuff/4331442a7694.html

As another example, the completed answer sheets from an exam for the Arkansas State Board of Cosmetology were lost or misaddressed in the FedEx shipment to the scoring agency. Ninety candidates will have to retake the exam: http://www.nwanews.com/adg/News/213242/

Two years ago Caveon’s assistance was sought in dealing with a similar situation. The car of an employee of a major test publisher was stolen. In the car were secured test materials, including an answer key to an upcoming state-wide public school examination. When the car was recovered the answer key was missing. There was not enough time to revise the test. The exam would be administered as scheduled. Our client wanted to know if the answer key was being distributed and if the integrity of the test administration had been compromised.

As we discussed the situation with the client, I was confident that we could detect a widespread breach. But, could we detect a situation when just a few students used the lost answer key? There was no doubt in my mind if the thief knew the market value of the answer key that it would be sold on the Internet. I knew this from first-hand experience. While I was teaching at the University, a dual-campus administration of the test coupled with a time lag between administrations led to the answer key being disclosed. Three of my students obtained the answer key to the exam through a Yahoo chat room. They scored 100% on all the questions, except the essay question, which they refused to answer.

The client gave us the following details about the test. There were 54 questions on the exam with 10 field test items and 44 core items. There were about 2 dozen different forms of the test. The forms all contained the same core items in the same locations, with form differences due to different sets of field test items. Slowly an analysis plan began to emerge. Because the answer key for only one of the forms was lost, we could score the field test items for all the other forms using the waylaid answer key. Scores on the field test items would be the keystone of the analysis.

We assumed that any student using the stolen answer key would not know which items were field test items and which were core items. We also assumed that the student would answer all the items (with potentially a few mistakes) using the stolen answer key. It was easy to determine that a widespread dissemination of the answer key had not occurred. Statistical methodology dictates that statistical tests are performed assuming the null hypothesis (i.e., the answer key was not in play) is true. Under this assumption we found that less than 2% of the tests had “high scores” (i.e., scores above the 95th percentile of the distribution), when 5% were expected. This was very good news. There was not a wide-spread dissemination of the answer key.

Next, we hypothesized that a few teachers or school administrators might have received and used the stolen answer key. Using a probability inversion formula, we rank ordered the schools by the proportion of tests where more than six correct answers on the field test items (using the stolen answer key) were found. We found that the proportion of schools in the upper tail (above 10%) was less than 7% when 10% were expected. This was good news. It meant that if the answer key was disseminated, it was not likely to have occurred through teachers or administrators. (We also visually inspected the 30 most extreme schools for “perfect” scores of 10 on the field test items for all the other forms except the one associated with the lost answer key. Nothing untoward was found in any of those schools.)

Finally, searching for the proverbial needle in the haystack, we hypothesized that a few isolated students may have been able to receive the answer key through personal contact with the thief on the Internet. In order to attack this problem we created a Bayesian probability model, where we estimated the probability that the stolen answer key was used by a particular student conditional upon the test score. Using this model we inferred a 95% upper bound on the proportion of student who used the answer key to be less than .09% (or nine in ten thousand). The five most extreme tests were visually inspected, and not one of them had a “perfect” score on the field test items, using the lost answer key.

The results of the analysis gave our client sufficient confidence to trust the integrity of the test administration. In order to place perspective on these statistical estimates, we note that the estimated bound (i.e., .09%) on answer key compromise is much, much lower than the actual proportion of students who copy from each other in the normal test taking situation. While we could not prove that the stolen answer key had not been used, we concluded the following:

If any students have gained access to the answer key, the data indicate the answer key has not been shared with friends. And, if the answer key was used, its use was isolated.

With 95% confidence, no more than .09% of students used the compromised answer key. It is very likely, in fact, that no student actually used the compromised answer key.

The above situations illustrate the importance of properly securing test materials. They also illustrate that by using innovative and defensible statistical analyses, testing program administrators may know the degree of security risk that is present. The analysis of the waylaid answer key illustrates the power of data forensics in protecting and maintaining exam and test security.

Dennis Maynes

Chief Scientist, Caveon Test Security

Leave a Reply