Some Machine Learning Algorithms Find Relationships Between Variables When None Exist — CTA Doesn’t

Ariel Linden & Paul R. Yarnold

Linden Consulting Group, LLC & Optimal Data Analysis, LLC

Automated machine learning algorithms are widely promoted as the best approach for estimating propensity scores, because these methods detect patterns in the data which manual efforts fail to identify. If classification algorithms are indeed ideal for identifying relationships between treatment group participation and covariates which predict participation, then it stands to reason that these algorithms would also be unable to find relationships when none exist (i.e., covariates do not predict treatment group assignment). Accordingly, we compare the predictive accuracy of maximum-accuracy classification tree analysis (CTA) vs. classification algorithms most commonly used to obtain the propensity score (logistic regression, random forests, boosted regression, and support vector machines). However, here we use an artificial dataset in which ten continuous covariates are randomly generated and by design have no correlation with the binary dependent variable (i.e., treatment assignment). Among all of the algorithms tested, only CTA correctly failed to discriminate between treatment and control groups based on the covariates. These results lend further support to the use of CTA for generating propensity scores as an alternative to other common approaches which are currently in favor.

View journal article

Weighted Optimal Markov Model of a Single Outcome: Ipsative Standardization of Ordinal Ratings is Unnecessary

Paul R. Yarnold

Optimal Data Analysis, LLC

This note empirically compares the use of raw vs. ipsatively standardized variables in optimal weighted Markov analysis involving a series for a single outcome—presently, ratings of sleep difficulties for an individual. Findings indicate that the raw score and ipsatively standardized ordinal ratings yield equivalent results in such designs.

View journal article

More On: “Optimizing Suboptimal Classification Trees: S-PLUS® Propensity Score Model for Adjusted Comparison of Hospitalized vs. Ambulatory Patients with Community-Acquired Pneumonia”

Paul R. Yarnold

Optimal Data Analysis, LLC

A recent article optimized ESS of a suboptimal classification tree model that discriminated hospitalized vs. ambulatory patients with community acquired pneumonia (CAP). This note suggests possible alternatives for two original attributes as a means of increasing model accuracy: patient disease-specific knowledge vs. “college education”, and patient-specific functional status and social support vs. “living arrangement”.

View journal article

Optimizing Suboptimal Classification Trees: S-PLUS® Propensity Score Model for Adjusted Comparison of Hospitalized vs. Ambulatory Patients with Community-Acquired Pneumonia

Paul R. Yarnold

Optimal Data Analysis, LLC

Pruning to maximize model accuracy (requiring simple hand computation) is applied to a classification tree model developed via S-PLUS to create propensity scores to improve causal inference in comparing hospitalized vs. ambulatory patients with community-acquired pneumonia. Research reported herein constitutes a thought-provoking example of a striking misalliance between forward analytic thinking and vestige statistical tools—a condition that dominates the empirical literature today. Modifications of ubiquitous methodological practices are suggested.

View journal article