ODA vs. π and κ: Paradoxes of Kappa

Paul R. Yarnold

Optimal Data Analysis, LLC

Widely-used indexes of inter-rater or inter-method agreement, π and κ sometimes produce unexpected results called the paradoxes of kappa. For example, prior research obtained four legacy agreement statistics (κ, Scott’s π, G-index, Fleiss’s generalized π) for a 2×2 table in which two independent raters failed to jointly classify any observations into the “negative” rating-class category: two indexes reported > 88.8% overall agreement and the other two reported < -2.3% overall agreement. ODA sheds new light on this paradox by testing confirmatory and exploratory hypotheses for these data, separately modeling the ratings made by each rater, and separately maximizing model predictive accuracy normed for chance (ESS; 0=inter-rater agreement expected by chance, 100=perfect agreement) as well as model overall accuracy that is not normed for chance (PAC; 0=no inter-rater agreement, 100=perfect agreement).

View journal article