Written by Dennis Maynes, Chief Scientist, Caveon Test Security
** Updated September 18 ** For more information, watch our Caveon Webinar Series session from this month, Improving Testing with Key Strength Analysis
The US Patent Office has declared that mathematical innovations and derivations are discoveries, not inventions. Whether invention or discovery, mathematical innovations do not come easily and usually take weeks, months, or even years to derive. This short essay describes how Key Strength Analysis was discovered over the course of seven years.
I worked on this problem because Caveon processes large amounts of data. Verification that the data have been processed correctly within a short time period is a daunting task. Hence, I needed powerful methods for determining whether we were handling answer keys properly.
Key Strength Analysis estimates the probability that a particular response choice to a multiple-choice item is the correct answer. By comparing probabilities for the distractors with the probability for the actual answer, the analyst can detect potentially mis-keyed items and items with weak or ambiguous answer keys.
My first experience in answer key validation occurred in late 2004. I was provided test result data from a client who withheld the answer key, being fearful of a security breach. Needing an answer key that was not provided, I wrote an algorithm that used eigenvector iterations to estimate an answer key from the test data. Eigenvector iteration was used because it is a quick way to estimate the factor loadings of the items onto the test’s main construct. After seeing my determination and receiving an analysis that was based on a potentially inaccurate answer key, the client decided to provide answer keys in the future. It turned out the algorithm was about 90% accurate. This experience taught me that it is possible to accurately estimate an answer key from the data.
Later, in 2005, we implemented other answer key validation algorithms. The first algorithm estimated and compared the item response probabilities for high-performing test takers. A subsequent algorithm, developed in 2007, used a simple linear regression to predict probability inversions of the most appealing distractor and the answer key for high-performing test takers. Both of these efforts provided what appeared to be reasonable answers, but they were somewhat ad-hoc, not being based on fundamental measurement principles. I learned that high-performing test takers will indicate when there is a problem with the answer key.
By 2009, I asked the question: “Is there a statistical test for determining whether the point-biserial correlation coefficient is too low?” Everyone that I asked replied, “No.” So, I attempted to create one. I wasn’t able to derive a statistical test, but I gained greater insight into the assumption of uni-dimensionality and classical test theory. I learned from the analysis of the mathematical quantities in the equation for the point-biserial correlation coefficient that sampling theory is an effective framework for inferences concerning answer key validation.
In 2011, all of the above ideas coalesced as I was developing an item analysis tool that would be more user-friendly than my previous programs. The breakthrough came by answering the question, “What is the probability that a particular response is correct, given the data?” Bayesian inversion supplied the means and method for answering this question. There were residual mathematical nuances that had to be addressed, but at that point, the solution seemed right. For the problem of estimating answer keys, the method is 85 to 99 percent accurate, depending upon the way the test has been designed.
Only time will tell whether this method will be useful to measurement professionals or not. I believe this method reliably detects potentially mis-keyed items and items with weak or ambiguous answer keys. I’m excited to receive feedback from the measurement community on how well it really works.
For those of you who are interested, we will present the method on September 18, 2013 by webinar and a recording of the webinar will be made available through the Caveon web site.
In the figure below, answer choice “B” is correct, but the probability that it is a correct answer given the observed data is less than 50%; while the probability that incorrect answer choice “A” is a correct answer given the data is greater than 50%. Hence, these data represent a potentially mis-keyed item.