Estimating Inter-Rater Reliability Using Pooled Data Induces Paradoxical Confounding: An Example Involving Emergency Severity Index Triage Ratings

Paul R. Yarnold

Optimal Data Analysis, LLC

The inter-rater reliability of patient triage ratings made using the Emergency Severity Index is computed and compared for pooled data within and across studies versus separately for pairs of independent raters. Findings reveal that results for the pooled findings exhibit confounding attributable to Simpson’s Paradox. These findings raise the concern that all estimates of inter-rater (and parallel-forms) reliability based on pooled data reported for all instruments discussed in the literature are susceptible to paradoxical confounding, and likely result in overestimation of rating reliability.

