Patterns, Propositions and Noise
Written by Dennis Maynes, Chief Scientist
April 11, 2016
Two years ago, Chuck Cooper, Chairman of the IBM Certification Council described an anomalous pattern and asked me, “What is the likelihood of this pattern appearing by chance?” I said, “Oh, less than one chance in a google!” (A google is a number written with a 1 followed by one hundred zeros.) Given that information, IBM proceeded to use information from that pattern as a basis for making informed security decisions. The point of the anecdote is that some patterns are so extremely unlikely it is virtually certain they did not happen by chance.
The pattern that Chuck described allowed IBM to infer exactly when, where and by whom their test questions were stolen and which individuals most likely used that stolen content. In other words, they found the source of a braindump, where the braindump was being sold, and who was buying the braindump, all as a result of recognizing and analyzing the right patterns.
I asked a friend, Katina, to make a visual graphic of these data for illustrative purposes. I asked her to show that (1) using braindumps is really a self-defeating behavior, and (2) braindump usage can be detected.
In the above graphic, you cannot see the actual date when the test questions were stolen or actual administration volumes, but you can see that a significant number of people began to use the braindump content shortly after the theft occurred.
Most data forensics analyses are not as clear cut and compelling as the above example. But, they all rely upon the ability to separate patterns from noise. Test takers who cheat do not wear a sign that proclaims they are cheating. Indeed, most of them do not want to be detected. Because we don’t want cheaters to know how we detect them, we usually don’t disclose the exact patterns that we use. If we shared our patterns and methods, cheaters would probably modify their behavior, rendering our methods useless.
A few years ago, I realized that some patterns provide information about the validity of test scores, and other patterns provide information about the behavior of individuals. Both kinds of information are important for testing test security propositions, such as: (1) does the score accurately represent the test taker’s competence, or (2) do the data suggest that inappropriate behavior occurred?
The forensic scientist needs to be able to separate test security patterns from noise. This is often a challenging task. Sometimes the task can be aided by adding design elements, such as verification or security questions, into the test. The task can be aided immensely when sensitive statistics are developed and used which detect specific types of prohibited activities, such as collusion. Because individuals who seek to gain an unfair advantage appear to be constantly probing security measures that have been implemented, we keep searching for new and powerful ways to detect illicit behavior. Recently, we have been analyzing time stamps when questions were answered and how answers were changed. An intriguing and not-yet-answered question is whether test takers have an individual pattern, like a digital signature, which is imprinted within the data and can be used for authentication and detection of surrogate test takers.
The pursuit of continually strengthening exam security leads us to explore more sophisticated methods for extracting patterns from the noise that is present in the test result data. If you are interested in learning more about current research in this area, please join us at the Conference of Test Security. The next conference will be held in Cedar Rapids, Iowa, October 18 to 20, 2016.