Case-specificity, i.e., variability of a subject’s performance across cases, has been a consistent finding in medical education.
It has important implications for assessment validity and reliability. Its root causes remain a matter of discussion. One
hypothesis, content-specificity, links variability of performance to variable levels of relevant knowledge. Extended-matching
items (EMIs) are an ideal format to test this hypothesis as items are grouped by topic. If differences pertaining to content
knowledge are the main cause of case-specificity, variability across topics should be high and variability across items within
the same topic low. We used generalisability analysis on results of a written test composed of 159 EMIs sat by two cohorts
of general practice trainees at one university. Two hundred and twenty-seven trainees took part. The variance component attributed
to subjects was small. Variance attributed to topics was smaller than variance attributed to items. The main source of error
was interaction between subjects and items, accounting for two-thirds of error. The generalisability D study revealed that
for the same total number of items, increasing the number of topics results in a higher G coefficient than increasing the
number of items per topic. Topical knowledge does not seem to explain case-specificity observed in our data. Structure of
knowledge and reasoning strategy may be more important, in particular pattern-recognition which EMIs were designed to elicit.
The causal explanations of case-specificity may be dependent on test format. Increasing the number of topics with fewer items
each would increase reliability but also testing time.
Keywords Assessment - Case-specificity - Clinical reasoning - Content-specificity - Extended-matching items - Generalisability - Pattern recognition - Postgraduate general practice training - Written assessment