Paul R. Yarnold
Optimal Data Analysis, LLC
Usually it is possible to identify numerous different hierarchically-optimal classification tree analysis (HO-CTA) models in applications having an adequate sample size and involving multiple attributes. The models differ in complexity—defined as the number of endpoints representing distinct patient strata: the fewer the number of strata, the more parsimonious the model. The different models also vary in normed predictive accuracy—defined as effect strength for sensitivity (ESS): 0 represents the predictive accuracy expected by chance for the application; 100 represents errorless prediction. The distance of each model from a theoretically ideal solution for the application—defined as a model having perfect accuracy and minimum complexity, is computed as a D statistic. The underlying descendant family of models (including the globally-optimal model with the lowest D possible by an HO-CTA model for the application) is identified by first obtaining a model without specifying minimum endpoint sample size, and then applying the minimum denominator selection algorithm (MDSA). These methods are illustrated in an application seeking to identify aspects of nursing care delivered to patients that predict satisfaction among 1,045 strongly satisfied and 671 moderately satisfied Emergency Department (ED) patients. HO-CTA models that explicitly maximize ESS versus the overall percentage of accurate classification (PAC) are contrasted.