September 11, 2015
Written by Kelli Foster, Ph.D., Director, Caveon Secure Exam Development and Support
As a child, I enjoyed looking for hidden pictures in children’s magazines (available in my dentist’s waiting room). Later as a mom, I spent hours with my children Finding Waldo or studying I Spy books. So now, blog reader, see if you can find the idioms contained in this blog post…
I have spent 20 years developing exam items, and I have noted a trend in the practice that concerns me. It is an increasing emphasis on form over substance. There seems to be some confusion over what constitutes a quality item and how quality can be judged.
What is a quality item?
The simple answer is pretty is as pretty does. A pretty item is an item that contributes to the validity of the scores of an exam. These “pretties” help measure the knowledge, skills, and abilities (KSAs) that the exam is intended to measure. These items help distinguish between test takers who possess these KSAs and those who do not. Therefore, the quality of a test item should be primarily judged by the way it behaves (performs), not by its appearance.
And how can item quality be determined?
We as exam developers are often guilty of making a mountain out of mole hill. We sweat over published typos in our items. We wring our hands when not all options begin with uppercase letters. We pass out when our options are not parallel. We dismiss item writers for submitting fill-in-the-blank stems and resign when test takers criticize our items. Stop! Enough! In the words of my granddaughter’s favorite movie heroine, Elsa from Frozen, “Let it go.”
Allow me to explain.
Not all item writing guidelines are equal. Primary considerations are an item’s Congruence (with objectives and training), Accuracy, Relevance, and Difficulty level (CARD). These considerations are best addressed by properly-trained subject matter experts. Secondary considerations involve adherence to rules of English/grammar, item writing guidelines, and program style guides. Do not misunderstand me, I am a psychometric reviewer, and I believe that I along with my colleagues can provide valuable feedback. But we let the tail wag the dog if these considerations take precedence over CARD issues.
Neither the CARD review nor the psychometric edit suffice when determining quality items. We need to consider an exam and its items “unfinished” until the data on item performance are analyzed and items are adjusted accordingly. We should also be teachable and allow the data we collect to shape our notions of “best practice.”
Surprising to all of us psychometric reviewers and testing professionals is the fact that some items we dub as “ugly” can often be the best performing items in an exam and conversely some items that have all of their “i”s dotted and “t’s” crossed fail to do their job.
Yup, in the world of test development, we need to spend more time focusing on validity and less time putting lipstick on pigs.
Key: (a) form over substance (b) pretty is as pretty does (c) make a mountain out of a mole hill (d) No, “Let it go” is not an idiom (d) “i”s dotted and “t’s” crossed (e) putting lipstick on pigs