by David Cannon, Pharm.D., Clinical Instructor, University
of Maryland School of Pharmacy
Unique assessment
tools have always been fascinating to me.
Once, when I was taking a practice exam consisting of 25 questions on
pharmacy law the following message appeared: “You scored a 56%, you passed!” How could that
be, I thought? Surely the minimum passing
score for a state law exam could not be that low! But, as it turned out, this
exam was an adaptive test. While the computer was reporting the percentage of
questions I scored correctly, behind the scenes it was doing calculations based
on the difficulty and weight of the questions. Once I began to peel back the surface of these
complicated algorithms, I wanted to learn more. But first, let’s review some basics about
assessment …
The purpose of an exam, including high stakes exams
to make state licensure decisions, is to use the assessment data (answers to
test items) to make inferences about the learner. Assessment is best approached by first
considering what the end requirements of the learner are. Then think about what
actions, jobs, or thoughts would illustrate mastery of the desired
requirements. By deciding what the goals of assessment are makes the process of
actually creating it much easier.1
Evidence-Centered Design utilizes a series of key questions to analyze the assessment
design. Table 1 is good example of a set of questions recommended by Mislevy,
et. al.:1
Table 1:
a.
Why are we assessing?
b.
What will be said, done, or predicted on the basis of the assessment results?
c.
What portions of a field of study or practice does the assessment serve?
d.
Which knowledge and proficiencies are relevant to the field of study or
practice?
e.
Which knowledge or proficiencies will be assessed?
f.
What behaviors would indicate levels of proficiency?
g.
How can assessment tasks be contrived to elicit behavior that discriminates
among levels of knowledge and proficiency?
h.
How will the assessment be conducted and at what point will sufficient
evidence be obtained?
i.
What will the assessment look like?
j.
How will the assessment be implemented?
Taken
from Automated
Scoring of Complex Tasks in Computer-based Testing1
|
ECD draws parallels to instructional design in that
these questions do not necessarily need to be asked in order, the outputs to
each question should be considered when examining the others, and these
questions should be repeated as necessary.1 To understand how evidence-centered design is utilized in creating assessments, the assessment tool must
be broken down its individual components.
When designing
assessments used for licensing examinations, many domains of knowledge are
tested. A domain is a complex of knowledge or skills that is valued, where
features of good performance or situations during which proficiency can be
exhibited, and where there are relationships between knowledge and performance.1 In a high stakes
examination, like a state board licensure exam, it is not sufficient for an examinee
to be competent in only one domain but not the others. To test proficiency in
each of the domains, smaller subunits of the assessment called “testlets” are
used. Testlets typically contain a group of assessment items that are related
to each other that would elicit the behaviors associated with the domain.3 It is vital to understand how these
examinations are designed from an evidence based perspective in order to
evaluate the validity of computerized adaptive testing.
So what is a
computerized adaptive test anyways? CAT
is an assessment tool that utilizes a iterative algorithm with the following
steps:2
1) Search the available items in
the testlet domain for an optimal item based on the student’s ability
2)
Present the chosen item to the student
3)
The student either gets the item right or wrong
4) Using this information as well
as the responses on all prior items, an updated ability estimate of the student
is determined
5)
Repeat steps 1-4 until some termination criteria is met
CAT is utilized in
many high-stakes licensing examinations such as the National Council Licensure
Examination (NCLEX), which is required by most states for nurses before they can
practice. In the case of NCLEX, after each item is answered by the examinee a
calculation is done in the background that determines an estimate of the
persons competency based on the difficulty of the item answered. The computer
then subsequently asks a slightly more difficult question to apply the
algorithm again, creating a new estimate of the candidates competency. This is
repeated until the computer reaches a predetermined cutoff (with a confidence
interval 95% in the case of the NACLEX3) for minimum competency or until
the number of test items has been exhausted. Put another way, the exam will cease
when the algorithm has determined with 95% certainty that the student’s ability
falls above or below a minimum competency standard. Check out this VIDEO of how the algorithm works behind
the NACLEX.3
Now you might be
wondering how you create an adaptive test. It’s a pretty complicated process that would
involve breaking down the different subject areas you want to test into
different domains. Then you’d need to
develop an item bank for each domain. Content experts would come together and
decide what items should be included while at the same time evaluating their
appropriateness and difficulty/weight. A
great free resource for creating your own adaptive test can be found here.4 The NCLEX is a great example of how
computerized adaptive testing brings together the ideas of evidence centered
design and instructional design by helping educators assess their students with
greater accuracy.
References:
1. Mislevy RJ, Steinberg LS, Almond RG, Lukas
JF. Concepts, Terminology, and Basic
Models of Evidence-Centered Design.
In: Williamson, D, Mislevy, R,
Bejar, I (eds). Automated Scoring of Complex Tasks in Computer-based Testing (1st
ed). Mahwah, NJ.: Lawrence Erlbaum Associates, 2006 (pp 15-47).
2. Thissen, D., Mislevy, R.J.. Testing Algorithms.
In Wainer, H. (eds.) Computerized Adaptive Testing: A Primer. Mahwah, NJ:
Lawrence Erlbaum Associates, 2000.
3. Computerized
Adaptive Testing (CAT). National Council of State Boards of Nursing.
Accessed on: March 11, 2013.
4. Software for
developing computer-adaptive tests. Assessment Focus. Accessed on: March
26, 2013.