Last month, November 2014, Phil Dickison was a guest presenter for Caveon’s monthly webinar. I had the privilege of co-presenting. I believe we successfully explained that statistics should be used to validate trusted test scores *and* to invalidate test scores when cheating has potentially occurred. In other words, the measurement professional should question the *validity of the scores* and not the *behavior of the test takers*. You can listen to the webinar here. https://www.youtube.com/watch?v=5zmfjnzc9c8

I have been thinking about how statistics can or should be used to make inferences about test taker behavior. Specifically, how can statistics support investigations into cheating, in addition to being used to invalidate test scores? Until recently, most measurement professionals have been unwilling to make inferences about cheating using statistics. I think there are a few good reasons for this reluctance: (1) test questions are designed to measure a person’s strengths, not cheating behaviors (2) a finding of cheating places a label upon a person, and (3) you can’t prove a person cheated “beyond any doubt” using statistics alone. I realize that “beyond any doubt” is an impossibly high standard to meet, but this appears to be the standard that has been suggested in the literature.

In the past few years, there has been some change in attitude and perspective about using statistics for this purpose. Indeed, some measurement professionals have stated that defensible statistical inferences about cheating and test taker behavior can be made; and they require Bayesian models^{1} to do so. I believe that statistical inferences about the behavior of test takers can, and should, be made when appropriate models and data are used. As I have thought about how this should be done, I have realized that at least three things must be established to determine that a person cheated: (1) cheating occurred, (2) the person was responsible for the cheating that occurred, and (3) the person *intended* or *tried* to gain an unfair advantage.

**Cheating occurred.** Unlike crimes like burglary or assault, there usually is no direct observable evidence that cheating happened. Even when a proctor observes suspicious behavior, rarely can those observations verify that cheating actually occurred. Using data anomalies, it’s possible to infer that cheating *may have* occurred. As far as I know, none of the models proposed in the literature to date have attempted to estimate a probability that cheating occurred.

**The person was responsible.** Responsibility is a rather amorphous concept when discussing cheating on tests. There will be no “smoking gun in the suspect’s possession.” At best, you might find a cell phone with incriminating text messages or crib notes with exam answers. Even if you find this evidence, you would need to show that the person used the crib notes or the text messages. It still can be difficult to show responsibility for cheating beyond any doubt. Again, I know of no models in the literature that estimate the probability that a specific individual was responsible for the cheating that probably occurred.

**The person intended to cheat.** I believe that cheating is essentially a fraudulent act, because test performance is falsified in order to deceive. In order to show fraud, you need to show intent. It is almost impossible to show intent without having a confession or statement from the suspected cheater. When confronted with an accusation of cheating, cheaters may reply, “Prove it!” Even a Bayesian analysis that computes high probabilities cheating occurred, the person was responsible, and the person intended to cheat, will not prove that cheating occurred beyond any doubt. In other words, the demand to “prove it,” is the cheater’s trump card. How do you demonstrate a person’s intent using probability theory?

I believe properly formulated models can be used to make inferences about test taker behavior. In my opinion, Bayesian models proposed in the literature to date have concentrated on data, not behavior. They have been overly simplistic. And, they have not incorporated the chain of reasoning that is required to make inferences about test taker behavior. I believe when models account for probable cheating, probable responsibility, and probable intent that they can support allegations and investigations of cheating.

^{1}Some writers have described Bayesian inference as “backwards reasoning” (in honor of Sherlock Holmes) because the probability equations are inverted or reversed in order to compute probabilities of propositions given the data, not probabilities of the data given the proposition. Bayesian inference differs from that of traditional statistics in the way probabilities are computed and used. Bayesian inference relies upon probabilities of propositions, not data. On the other hand, traditional statistics compute likelihoods of data conditioned upon assumed propositions. In other words, the Bayesian methods compare the likelihoods of different explanations of the data, not the likelihood of the data. They are well suited for making inferences judging between alternative explanations, as long as the alternatives are well-defined, or simple. In the statistical literature, evaluation of complex propositions such as “cheating” or “not cheating,” which are intrinsically composed of sub-propositions and networks of sub-propositions, are not easily cast into the Bayesian framework of inference.