April 11, 2013

I, I and II, or I, II and III?

by Adrian Wong, Pharm.D., PGY Pharmacy Practice Resident, The Johns Hopkins Hospital

Recently graduated and staring at the computer screen in front of me, I once again repeated what I had done many times in pharmacy school – crammed.  I had received warnings about how horrific the Multistate Pharmacy Jurisprudence Examination (MPJE) was from all my mentors and peers.  I was truly dreading the outcome.  Examinations were never my strong suit and I feared those multiple-multiple choice questions that seem to appear on these high stakes exams all too frequently.  Regardless of the name they are given – K-type, complex multiple-choice (CMC), or complex-response questions – they all evoke the same feeling of dread.  If I need to jog your memory, an example is shown here:

Question:  Based on the best available evidence, which of the following is the most appropriate medication to initiate for management of this patient’s congestive heart failure?

I.   Metoprolol succinate
II.  Metoprolol tartrate
III. Atenolol

a.    I only
b.    III only
c.     I and II only
d.    II and III only
e.    I, II and III

After my experience with these questions, it always seems to come down to one of two answers.  Even using an educated guess, I never seemed to get the “right” answer.  From my experience with multiple-choice questions, the answer is rarely ever all of the above.  So why did this format of question come to be?  Who came up with this traumatizing format?  What is the data behind this torture?

Based on my research, the complex multiple-choice (aka K-type) question was introduced by the Educational Testing Service in 1978.1  This question format was designed to accommodate for situations when there is more than one correct choice - much as in real life.  These questions also appear to be more difficult that comparable “traditional” multiple-choice questions.2  Therefore, in the world of health professionals, where multiple correct answers may exist, and, in an attempt to increase the difficulty of board examination questions, the CMC format was adopted by many professional testing services and persists today.

Weaknesses of this format exist.  Albanese evaluated the use of Type-K questions and identified several limitations including:2

1.    Increased likelihood of “cluing” of secondary choices
2.    Lower test score reliability
3.    Greater difficulty constructing questions
4.    Extended time required to answer questions

“Cluing” results when a test-taker is able to narrow down choices based on the wording of the question or the available answer options.  For example, my thought process for the question above helped me to narrow down the choices solely by looking at the question or “stem.”  The question is looking for only one “most appropriate” answer (assuming, of course, that the test-writer has written a grammatically correct statement), as denoted by “is” versus “are.”   Thus, as a saavy test taker, I would gravitate toward choices “a” and “b.”  An additional clue is that there are two similar choices (metoprolol succinate vs. tartrate), one of which is likely to be the correct answer.  Thus, cluing may lead to lower test score reliability and the results may be dependent on how well a “test-taker” one is through “cluing.”2

Additional studies have further illustrated the limitations of this assessment format.  One study examined the amount of time needed to complete a CMC-based test compared to multiple true-false (MTF) test.3   On average, it took 3.5 times as long to complete a CMC-based test compared to a MTF test.

However, after evaluating this literature, I will begrudgingly admit that CMC questions, under certain circumstances, could be effective despite their inherent weaknesses.  Researchers at one pharmacy school evaluated the use of CMC questions using a partial-credit scoring system and compared it to traditional dichotomous (right vs. wrong) scoring.4  The instructors designed a test to examine student knowledge regarding nonprescription drugs.  The test was administered to 150 student pharmacists in their second professional year.  The purpose of this study was to optimize the measurement of student pharmacist knowledge without penalty for guessing or incorrect responses.  Partial-credit scoring was accomplished by assigning a tiered score based on descending “best” answers.  Test items were sent to an external content review panel for content validity.  Parameters evaluated in this study included item difficulty,
item discrimination (e.g. the ability to determine low and high-ability students), and the coefficient of effective length (CEL), a measure that determines how many more questions a test would need in order to produce the same reliability as another scoring method.  The authors found that with partial-credit scoring, the test was more reflective of actual student knowledge.  There was no statistical differences between the two methods with regard to item discrimination but there was greater CEL with dichotomous scoring.  Indeed, the findings indicate that dichotomous scoring would require 26% more questions to achieve the same reliability measuring student’s actual knowledge of the subject matter.  The authors recommend more studies regarding this partial-credit scoring method for CMC questions, including its ability to predict student achievement and effect on student confidence. 

Alternatives to traditional multiple choice testing that have been evaluated in the literature include the use of open-ended, uncued (UnQ) items, which allows the test-taker to select an answer from over 500 responses.  This type of test has been used for Family Practice board examinations.One study conducted in over 7,000 family practice residents found the UnQ to be a more reliable method for determining a physician’s competence.

The best mode of assessment probably dependents on the material being tested.  In my experiences, the open-response format allows for the best indicator of a student’s knowledge - but like any test, the questions must be carefully worded.  The biggest weakness of open-response essay-type exams is the time required to grade them [Editor’s note:  As well as the inherent subjectivity required when judging the “correctness” of the student’s answers].   To my chagrin, the use of CMC questions will likely continue for licensing examinations for healthcare professionals.


1.  Haladyna TM. The effectiveness of several multiple-choice formats. Appl Measure Educ 1992;5:73-88.
2.  Albanese MA. Type K and other complex multiple-choice items: an analysis of research and item properties. Educ Measure Issues and Practice 1993;12:28-33.
3.  Frisbie D, Sweeney DC. The relative merits of multiple true-false achievement tests. Journal of Educational Measurement 1982;19:29-35.

No comments: