Smart Items: A Girl Can Dream, Can’t She??

Written by: Jamie Mulkey, Vice President, Caveon Test Development

Recently, I heard a segment on NPR on SMART appliances. You know, those appliances in your home that you manage from a smartphone app? For example, setting your thermostat or viewing footage from a security camera in your house while you’re vacationing in Hawaii. Well it seems that these home appliances are destined to get even SMARTER, eventually allowing you talk to them directly – like telling your coffee pot to brew at 6am the following morning.

I started day dreaming about what a SMART item would be. What if we could talk to the SMART Item Appliance and ask it for the right assessments for any given set of tasks? What would these items look like and how would we get to a point where such a database would exist?

Let’s start backwards at the end result. We would need to begin with a desired skill to be performed. Let’s say, we want to determine if a medical professional has the skills needed to administer a flu shot. If this were a SMART item, the question we’d be asking is, “what are the best items for evaluating readiness to administer a flu shot?” The SMART Item Appliance would come back with a number of different ways this skill could be measured. We might have some multiple-choice items that deal with placement of the needle on the person’s arm. What are good injection areas? What places should be avoided? There may be items that test for the right syringe and sterilizing elements to use. We might also see video-based questions that depict various ways of a shot being administered. We could gleefully point and click on the items we wanted to include in our assessment of administering a flu shot. Then we would administer our test, feeling confident that the right skills and cognitive abilities have been used to test for this performance.

We’d quickly realize our confidence in the quality of the test would depend on the quality of the intelligence entered into the SMART Item Appliance. In order to have the right skills, job task analyses (JTAs) would need to be executed so that we truly understood the performances that needed to be tested. These performances in turn, would generate the desired objectives that would subsequently produce these items.

This is where my dream turns into a nightmare. While JTAs are performed quite routinely today in many professions, we as practitioners often find it difficult to create objectives that truly capture the desired performance that results in great test items. For example, providing a context for the performance and evidence statements of what we expect to see as a result of the performance.

Maybe as we work our way to the SMART Item Appliance, we can work on our processes for developing objectives that truly measure; becoming SMARTER about capturing knowledge, skills, and abilities that enable us to create better items more quickly.

By the way, do you think they will ever come up with a SMART appliance that will make the clothes scattered all over my daughter’s room put themselves away?



    Hi Jamie,  Great article!  
    In reality, many of our clients think they can refrain from doing a proper JTA.  In this fast paced work environment, clients are always looking for ways to bring costs down.  Skimping of the JTA leads us further and further away from SMART items.   
    Convincing clients of the "worth/ROI" in creating a great blueprint with very specific, measurable objectives IS its own reward.  Not only does the client get tons of reality-based data about the job, but it also results in the most honest picture of what it takes to successfully perform the job.  Once the objectives are considered “public”, clients will be able to use the data to INFORM other functions as well.  After a certification exam is validated, one of my clients integrated the objectives into the official HR job descriptions.  So, the objectives had a double benefit to the client.
    We have seen time and time again, the positive outcomes of a good JTA as well as the negative impact of “vague” objectives.  SMART test items definitely depend on the quality of the objectives.
    I agree wholeheartedly and encourage colleagues to sell the benefits of developing those measurable objectives that lead the way to SMART items.   Your article will give all of us colleagues a new twist in selling the benefist of creating more meaningful objectives in our JTAs.

  • Hi Jamie,
    I am about to send a response to NCME on their brief on Test and Data Integrity. The gist is:
    1. Accuracy: Students answers are mostly thoughtful, coming from three sources 1) memorization overestimates proficiency, 2) moderate knowledge approximates proficience provided there are no serious reading or personal impediments, 3) profound understanding, which can lead to "wrong" answers that accomodate more depth than intended by the item writer; underestimating proficiency. These scores cannot be accurate.
    2, Fairness: M-c tests measure interpretation skills and test-taking motivation at the time of sitting. Personal variables like health, family problems, language insufficiencies, etc. contaminate results when "knowledge" is the variable we assumeqwe are measuring. These scores cannot be fair.
    3. Utility: The psychological and interpretation aspects of responding are not routinely collected. In their absence, these scores lack usefulness.
    4. Interpretability: In the absence of the considerations in 2. and 3., these scores are not interpreatable. They would become more interpretable if these aspect were collected and reported as part of the assessment.
    5. Comparability: Students with the same scores can have nCs possible combinations of designated answers where "n" is the number of items and "s" is the score. If we assume that responses are random, as the statistical procedures used require, test scores are not comparable.
    6. Test scores based upon the frequencies of designated answers are invaslid measures of anything. They ignore the qualitative aspects of response selecrtion and assume randomness when most selections are thoughtful. In addition, the multiplicity of options is reduced to two (right/wrong) to fit the mathematical models being employed for interpretation. Thus the frequency of designated answers are an invalid way, both mathematically and psychologiccally, of measuring anything.
    7. Given the consequences in high stakes settings of using invalid measurements, subterfuge is the natural survival response. Cheating is the procuct of inappropriate scoring procedures. Correcting this problem will require changing scoring procedures to Answer-Selection-Pattern-Analysis (ASPA). There is no other solution.
    Show this commentary to John Fremer. He will remember our discussions on these topics.

  • I m dreaming of SMART appliance that will give me live video footage when I administered online MCQ to check cheating behavior of examined, concurrently taking test in more than 500 locations worldwide.

Leave a Reply