April 2, 2013

Computerized Adaptive Testing


by David Cannon, Pharm.D., Clinical Instructor, University of Maryland School of Pharmacy

Unique assessment tools have always been fascinating to me.  Once, when I was taking a practice exam consisting of 25 questions on pharmacy law the following message appeared:  “You scored a 56%, you passed!” How could that be, I thought?  Surely the minimum passing score for a state law exam could not be that low! But, as it turned out, this exam was an adaptive test. While the computer was reporting the percentage of questions I scored correctly, behind the scenes it was doing calculations based on the difficulty and weight of the questions.  Once I began to peel back the surface of these complicated algorithms, I wanted to learn more.  But first, let’s review some basics about assessment …

The purpose of an exam, including high stakes exams to make state licensure decisions, is to use the assessment data (answers to test items) to make inferences about the learner.  Assessment is best approached by first considering what the end requirements of the learner are. Then think about what actions, jobs, or thoughts would illustrate mastery of the desired requirements. By deciding what the goals of assessment are makes the process of actually creating it much easier.1

Evidence-Centered Design utilizes a series of key questions to analyze the assessment design. Table 1 is good example of a set of questions recommended by Mislevy, et. al.:1

Table 1:

a. Why are we assessing?
b. What will be said, done, or predicted on the basis of the assessment results?
c. What portions of a field of study or practice does the assessment serve?
d. Which knowledge and proficiencies are relevant to the field of study or practice?
e. Which knowledge or proficiencies will be assessed?
f. What behaviors would indicate levels of proficiency?
g. How can assessment tasks be contrived to elicit behavior that discriminates among levels of knowledge and proficiency?
h. How will the assessment be conducted and at what point will sufficient evidence be obtained?
i. What will the assessment look like?
j. How will the assessment be implemented?

Taken from Automated Scoring of Complex Tasks in Computer-based Testing1




















ECD draws parallels to instructional design in that these questions do not necessarily need to be asked in order, the outputs to each question should be considered when examining the others, and these questions should be repeated as necessary.1 To understand how evidence-centered design is utilized in creating assessments, the assessment tool must be broken down its individual components.

When designing assessments used for licensing examinations, many domains of knowledge are tested. A domain is a complex of knowledge or skills that is valued, where features of good performance or situations during which proficiency can be exhibited, and where there are relationships between knowledge and performance.1   In a high stakes examination, like a state board licensure exam, it is not sufficient for an examinee to be competent in only one domain but not the others. To test proficiency in each of the domains, smaller subunits of the assessment called “testlets” are used. Testlets typically contain a group of assessment items that are related to each other that would elicit the behaviors associated with the domain.3  It is vital to understand how these examinations are designed from an evidence based perspective in order to evaluate the validity of computerized adaptive testing.

So what is a computerized adaptive test anyways?  CAT is an assessment tool that utilizes a iterative algorithm with the following steps:2
1) Search the available items in the testlet domain for an optimal item based on the student’s ability
2) Present the chosen item to the student
3) The student either gets the item right or wrong
4) Using this information as well as the responses on all prior items, an updated ability estimate of the student is determined
5) Repeat steps 1-4 until some termination criteria is met

CAT is utilized in many high-stakes licensing examinations such as the National Council Licensure Examination (NCLEX), which is required by most states for nurses before they can practice. In the case of NCLEX, after each item is answered by the examinee a calculation is done in the background that determines an estimate of the persons competency based on the difficulty of the item answered. The computer then subsequently asks a slightly more difficult question to apply the algorithm again, creating a new estimate of the candidates competency. This is repeated until the computer reaches a predetermined cutoff (with a confidence interval 95% in the case of the NACLEX3) for minimum competency or until the number of test items has been exhausted. Put another way, the exam will cease when the algorithm has determined with 95% certainty that the student’s ability falls above or below a minimum competency standard. Check out this VIDEO of how the algorithm works behind the NACLEX.3

Now you might be wondering how you create an adaptive test.  It’s a pretty complicated process that would involve breaking down the different subject areas you want to test into different domains.  Then you’d need to develop an item bank for each domain. Content experts would come together and decide what items should be included while at the same time evaluating their appropriateness and difficulty/weight.  A great free resource for creating your own adaptive test can be found here.4  The NCLEX is a great example of how computerized adaptive testing brings together the ideas of evidence centered design and instructional design by helping educators assess their students with greater accuracy.

References:

1.  Mislevy RJ, Steinberg LS, Almond RG, Lukas JF.  Concepts, Terminology, and Basic Models of Evidence-Centered Design.  In:  Williamson, D, Mislevy, R, Bejar, I (eds). Automated Scoring of Complex Tasks in Computer-based Testing (1st ed). Mahwah, NJ.: Lawrence Erlbaum Associates, 2006 (pp 15-47).
2. Thissen, D., Mislevy, R.J.. Testing Algorithms. In Wainer, H. (eds.) Computerized Adaptive Testing: A Primer. Mahwah, NJ: Lawrence Erlbaum Associates, 2000.
3.  Computerized Adaptive Testing (CAT). National Council of State Boards of Nursing. Accessed on: March 11, 2013.
4.  Software for developing computer-adaptive tests. Assessment Focus. Accessed on: March 26, 2013.

No comments: