Optimizing Suboptimal Classification Trees: S-PLUS® Propensity Score Model for Adjusted Comparison of Hospitalized vs. Ambulatory Patients with Community-Acquired Pneumonia

Paul R. Yarnold

Optimal Data Analysis, LLC

Pruning to maximize model accuracy (requiring simple hand computation) is applied to a classification tree model developed via S-PLUS to create propensity scores to improve causal inference in comparing hospitalized vs. ambulatory patients with community-acquired pneumonia. Research reported herein constitutes a thought-provoking example of a striking misalliance between forward analytic thinking and vestige statistical tools—a condition that dominates the empirical literature today. Modifications of ubiquitous methodological practices are suggested.

View journal article

The Structure of Perfect Optimal Models with a Two-Category Class Variable and Four or Fewer Endpoints

Paul R. Yarnold

Optimal Data Analysis, LLC

An optimal model has a specific geometric configuration defined by the number of attributes (“independent variables”—schematically illustrated using circles) and endpoints (defined by response on attribute—indicated by rectangles). Branches direct attributes to endpoints via an if/then/else-based decision rule identified by the (ODA/CTA/novometric) algorithm and operationalized vis-à-vis numerical thresholds or categorical rosters which explicitly maximize (weighted) classification accuracy. In hopes of aiding in the visualization, pursuit and discovery of perfectly accurate statistical classification models, this paper presents schematic diagrams which correspond to combinations of number of attributes and endpoints that are possible for a range of optimal models commonly reported.

View journal article

Value-Added by ODA vs. Chi-Square

Paul R. Yarnold

Optimal Data Analysis, LLC

Beyond identifying the most accurate classification model which exists for the sample, and estimating cross-generalizability vis-à-vis jackknife, hold-out and/or other validity methods, ODA provides the exact one- or two-tailed P-value, the sensitivity and predictive value for each category of the class variable, and the effect strength corrected for chance.

View journal article