How Many EO-CTA Models Exist in My Sample and Which is the Best Model?

Paul R. Yarnold

Optimal Data Analysis, LLC

As concerns the existence of statistically reliable enumerated-optimal classification tree analysis (EO-CTA) model(s) for a given application, possible alternative analytic outcomes are: no EO-CTA model exists; one model exists; or a descendant family (DF) that consists of two or more models exists. Models in a DF maximize ESS for unique partitions of the sample, and the model with the lowest observed D statistic is the globally-optimal CTA (GO-CTA) model for the application. The brute-force method of identifying a DF involves obtaining an initial EO-CTA model without specifying minimum end¬point sample size, then applying the minimum denominator selection algorithm (MDSA) to the initial model. A more efficient methodology for obtaining the GO-CTA model involves including only the attribute subset identified using structural decomposition analysis (SDA). The DF for the SDA attribute subset differs from the DF identified for the entire attribute set because the DF is data-specific. These methods are illustrated for an application using rated aspects of nursing and physician care to discriminate 1,045 very satisfied vs. 671 satisfied Emergency Department (ED) patients.

View journal article

Pruning CTA Models to Maximize PAC

Paul R. Yarnold

Optimal Data Analysis, LLC

In CTA weighting by prior odds is used if a model is sought to maximize ESS, which is explicitly optimized by a pruning algorithm that deconstructs a fully-grown model into all nested sub-branches and then reassembles all possible combinations of sub-branches to identify the configuration with greatest ESS. In contrast, unit-weighting is used if a CTA model is sought to maximize PAC, explicitly optimized using the pruning algorithm to reassemble all possible sub-branch combinations and identify the configuration with greatest PAC.

View journal article