# Are identical answers to exam questions proof of cheating on tests?

When it comes to supporting an allegation of cheating on tests, there is rarely better statistical evidence than having two (or more) tests with identical sets of responses, or identical answers. Having a great interest in this topic, I have read carefully the abstracts of Rice University Honor Council meetings where these types of allegations are taken very seriously. In several instances of alleged academic fraud, the Honor Council has found the evidence of identical solutions and identical answers to be compelling.

“The Rice Honor System was created by students in 1916. That it has functioned so well for so long is a reflection of the trust and respect that Rice students show to one another and to the University. It is one of Rice’s most highly valued traditions and a vital part of your education–education in responsibility and integrity.” http://honor.rice.edu/

In one instance, the Council minutes read:

Witness 1, the professor for the class, stated that he believed the similarities between the True / False answers and the essay answers given by Student A and Student B to be strikingly similar. He … presented a statistical analysis of the probability of this occurring in certain situations.

In the above case, despite having a probability analysis, the Honor Council did not find that the honor code had been violated (i.e., cheating was not found).

In another instance, the Honor Council had a different finding:

Some members felt that the identical answers on some portions of the exam were beyond coincidence or having similar notes or studying together. Members were suspicious of the fact that these similarities would arise after the students used different sources of information when answering the questions. … Some members were not convinced by the explanations …

Despite denials of cheating in the above situation, both students were found in violation of the honor code.

Here’s a Google search link if you wish to read some of these abstracts.

It is evident from these two abstracts that the Honor Council attempts to find plausible explanations for identical answers and excessive similarities between test questions. It is also evident that the Honor Council may act without having definitive proof. As an example of the degree of “proof” or evidence that may be required to take action in a case of suspected cheating, consider this statement from the University of Western Ontario:

It is particularly important to understand that the conclusion that a student committed a scholastic offense does not have to be supported by evidence beyond a reasonable doubt. In an exam writing situation, that means that a decision maker may conclude that cheating took place, even if it is possible that two people got some identical answers by chance.

The observation that two tests have identical answers is very reliable evidence as defined by the criterion I proposed in my most recent post, because the observation is (1) factual, (2) objective, (3) credible, and (4) defensible. We require that the evidence have one additional attribute before believing that cheating probably occurred. The evidence must be strong.

In order to evaluate the strength of evidence of identical answers on tests, we require the probability of the observed responses. At Caveon, the probability for the observed item responses is estimated using item response theory. We compute this probability by multiplying all the probabilities together of the selected responses (we assume the selected responses are conditionally independent) and then normalizing the product by the marginal probability of the observed score. Formulas for computing exact probabilities are difficult to derive and program, which means that most practitioners who encounter these situations will rely upon judgment and intuition in the same way the Rice Honor Council does.

I have pasted in a table of sampled probabilities for an 18 item test, below. The probabilities are calculated knowing the score that was obtained on the test. So, if we know a person answered all 18 items correctly the probability that another person who answered all 18 items correctly would match is equal to one. If the answer was correct, it is highlighted in gold in the table.

Even though I routinely evaluate these types of probabilities, I have been surprised by some instances of identical response data. For example, the probability of an identical test when all items are answered correctly is 1 (as in the first row of the table). But, the probability of an identical test when all but one or two questions are answered correctly may be as high as .10 or .25 (see the second and fourth rows of the table). On the other hand, if several questions are answered incorrectly, the probability of an identical test may be 1 in 100 million or even smaller. The wide variation in these probabilities is a function of the number of correctly answered test questions and the selected responses.

If the probabilities of some test response patterns are sufficiently high (because the tests are easy or the examinees are very proficient) and if we have a large enough group, we might expect to see many identical tests. Probability computations for the number of observed identical tests can be very difficult. This is an instance of the “birthday problem” with unequal probabilities.

At the beginning of this discussion, it appeared that we had a relatively straightforward and simple problem. It often occurs with statistics that many apparently simple problems become very complex, very quickly. The analysis of identical answers for two exams is one of those problems. The answer to the question with which we began the discussion must be: We cannot prove that cheating occurred when we have identical answers for two test instances, but in many situations we can obtain very strong, reliable evidence leading us to conclude that cheating occurred and the conclusion would be right, nearly always.