Establishing measurement equivalence is important because inaccurate assessment may lead to incorrect estimates of effects
in research, and to suboptimal decisions at the individual, clinical level. Examination of differential item functioning (DIF)
is a method for studying measurement equivalence. An item (i.e., one question in a longer scale) exhibits DIF if the item
response differs across groups (e.g., gender, race), controlling for an estimate of the construct being measured. A distinction
between applications in health, as contrasted with other settings such as educational and aptitude testing, is that there
are many health-related constructs and multiple measures of each, few of which have received much critical evaluation. Discussed
in this article are several methods for detection of differential item functioning (DIF), including non-parametric and parametric
methods such as logistic regression, and those based on item response theory. Basic definitions and criteria for DIF detection
are provided, as are steps in performing the analyses. Recommendations are presented and future directions discussed.
Keywords Differential item functioning - Measurement equivalence - Health
The opinions expressed in this article are those of the authors. No official endorsement by AHRQ or the Department of Health
and Human Services is intended or should be inferred.