Resources
 
> Cheating News
> E-Mail Archives
> Conference Photos
> Resource Links
> Caveon Articles
> Guest Articles

Recent Innovations in Data Forensics - 2006 (view PDF)
By Dennis Maynes, Chief Scientist, Caveon

"As the technological sophistication of cheaters increases, it must be met with an equal or greater improvement in our tools of detection. Better statistical measures of test aberrance will help make test security a reality." David Foster Ph.D., CEO, Caveon

Overview
This document describes some of the innovations that have been introduced in Caveon Data Forensics since October 1, 2005. Several technical innovations have been made relating to the models and algorithms, but there are two innovations are especially apparent to customers.

Technical Innovations
Caveon has enhanced its algorithms for estimating IRT models beyond the typical multiple choice format. The models are capable of handling hundreds of different responses to a question, through statistical pooling of response models. Typically, this means that as open-ended items (such as gridded-in responses) may be processed, using appropriate probability estimation techniques.

Caveon has developed techniques for estimating item latency aberrance in extremely speeded test situations. A speeded test inherently introduces aberrance in the responses and new techniques allow for detecting outliers in this constrained environment.

Item Aberrance Analysis
Caveon has enhanced the test analysis with Data Forensics to associate shifts in the p-values of individual items with aberrance. This allows for item compromise rates to be estimated. It also allows the creation of lists of items that are most likely to be compromised. There are two components to this new analysis. The most important component is a global, overall view of aberrance effects on items. This analysis is shown using a scatter plot similar to Figure 1 and Figure 2.

The pink squares in Figures 1 and 2 show item p-values for non-aberrant test takers. The blue diamonds and yellow triangles show the p-values for the same items that are associated with aberrant test takers. The blue diamonds represent items where the p-values for aberrant test takers are significantly higher than the p-values for non-aberrant test takers and, consequently, may be compromised.

The estimated compromise rate is derived from the number of items that are plotted using the blue diamonds. The plot for Figure 2 shows a moderately strong aberrance relationship. The plots tend to be sinusoid shaped, with the blue diamonds concentrated at the lower p-value ranges (i.e., the more difficult items) and the yellow triangles concentrated at the upper p-value ranges (i.e., the easier items). This is consistent with the adage that “Aberrance is getting the hard items right, while missing the easy items.”

A secondary component of the item aberrance analysis is the availability of item trend data. As a rule, these data are made available in the spreadsheets but are not plotted. However, it is easy to plot these data. Generally, we do not see temporal effects that are associated with the items, but it does happen. An example of this is shown in Figure 3.

The blue diamonds plot the actual p-values that are calculated on weekly numbers. There was one change detected at end of October 2005 where the overall p-value increased for the item and aberrance appears to be associated with this increase. The light-blue x’s represent the p-value of the item for non-aberrant tests. The yellow triangles represent the p-value of the item for aberrant tests. At the end of October, the data indicate that this item may have been subjected to a security breach. The aberrant test takers are now doing much better on the item than non-aberrant test takers.

Refinements in Collusion Analyses
Caveon has introduced a refinement in its collusion analysis. This is a reporting refinement that is intended to help understand the nature of collusion and why the tests are similar. This refinement is not offered as a standard component of Caveon Data Forensics. Instead it is used during interpretation of the data to aid with displaying some of the more egregious cases of collusion. However, displays of this nature can be produced for specific test patterns, as requested by clients. Such displays could be useful in adjudication proceedings when score invalidations are considered.

A triplet of tests is shown in Table 1.

The analysis uses the concept of a dominant response. A response is dominant if more than half the test takers provided the response. In Table 1, correct dominant responses are highlighted using tan (or beige) and incorrect dominant responses are highlighted using gold. There are no non-dominant responses shown in Table 1. However, probability analysis indicates that at least 10 non-dominant responses were expected if the tests were answered independently.

Figure 4 provides a side-by-side illustration of the observed and expected agreement between these three tests.

The left panel in Figure 4 provides the number of observed responses, whether they are dominant correct, dominant incorrect, or non-dominant. The right panel in Figure 4 provides the expected numbers of responses in each category under assumptions that the tests are independent.

The probability of this level of agreement between the tests is 1 in 106 or 1 million.

The probability space for the distribution of these tests is shown in Figure 5.

The blue square in Figure 5 represents the observed level of agreement. The orange diamond represents the expected agreement. The contour lines are at increasing powers of 100. The first line, Bound 2, represents a probability level of .01. The second line, Bound 4, represents a probability level of .0001. And so forth, until the last line at Bound 12 represents a probability level of 1 in 1012. The upper bound is the absolute limit of the counts and represents 54 which is the number of items on this exam.

© Caveon, LLC 2006

 

Send this page to a friend.



HOME :: SERVICES :: RESOURCES :: COMPANY :: PRESS