ODA Laboratory Relocating, Introductory Offer Expires September 1, 2016

Effective today (July 12, 2016) the ODA lab is suspending sales of books and software until August 2, 2016 (when new orders will begin to be processed).

The current introductory half-price offer for the new book, Maximizing Predictive Accuracy ($49) will remain in effect until August 31, 2016. Starting on September 1, 2016, this book will be sold at its listed price of $98.

The current introductory offer of free MegaODA software for anyone purchasing the new book also remains in effect until August 31, 2016. Starting on September 1, 2016, no free software is offered.

How Many EO-CTA Models Exist in My Sample and Which is the Best Model?

Paul R. Yarnold

Optimal Data Analysis, LLC

As concerns the existence of statistically reliable enumerated-optimal classification tree analysis (EO-CTA) model(s) for a given application, possible alternative analytic outcomes are: no EO-CTA model exists; one model exists; or a descendant family (DF) that consists of two or more models exists. Models in a DF maximize ESS for unique partitions of the sample, and the model with the lowest observed D statistic is the globally-optimal CTA (GO-CTA) model for the application. The brute-force method of identifying a DF involves obtaining an initial EO-CTA model without specifying minimum end¬point sample size, then applying the minimum denominator selection algorithm (MDSA) to the initial model. A more efficient methodology for obtaining the GO-CTA model involves including only the attribute subset identified using structural decomposition analysis (SDA). The DF for the SDA attribute subset differs from the DF identified for the entire attribute set because the DF is data-specific. These methods are illustrated for an application using rated aspects of nursing and physician care to discriminate 1,045 very satisfied vs. 671 satisfied Emergency Department (ED) patients.

View journal article

Pruning CTA Models to Maximize PAC

Paul R. Yarnold

Optimal Data Analysis, LLC

In CTA weighting by prior odds is used if a model is sought to maximize ESS, which is explicitly optimized by a pruning algorithm that deconstructs a fully-grown model into all nested sub-branches and then reassembles all possible combinations of sub-branches to identify the configuration with greatest ESS. In contrast, unit-weighting is used if a CTA model is sought to maximize PAC, explicitly optimized using the pruning algorithm to reassemble all possible sub-branch combinations and identify the configuration with greatest PAC.

View journal article

FREE MegaODA Software Included for Personal Use with Book Purchase

The book will be more fun to read (that is, fun to work) if the reader has the software necessary to test maximum-accuracy methods using their own data. So, for a limited time a FREE copy of MegaODA software will be included for the personal use of the person who buys the book. This software is described in the Resources Tab, and in many examples in the book, and articles published in Optimal Data Analysis.

Identifying the Descendant Family of HO-CTA Models by using the Minimum Denominator Selection Algorithm: Maximizing ESS versus PAC

Paul R. Yarnold

Optimal Data Analysis, LLC

Usually it is possible to identify numerous different hierarchically-optimal classification tree analysis (HO-CTA) models in applications having an adequate sample size and involving multiple attributes. The models differ in complexity—defined as the number of endpoints representing distinct patient strata: the fewer the number of strata, the more parsimonious the model. The different models also vary in normed predictive accuracy—defined as effect strength for sensitivity (ESS): 0 represents the predictive accuracy expected by chance for the application; 100 represents errorless prediction. The distance of each model from a theoretically ideal solution for the application—defined as a model having perfect accuracy and minimum complexity, is computed as a D statistic. The underlying descendant family of models (including the globally-optimal model with the lowest D possible by an HO-CTA model for the application) is identified by first obtaining a model without specifying minimum endpoint sample size, and then applying the minimum denominator selection algorithm (MDSA). These methods are illustrated in an application seeking to identify aspects of nursing care delivered to patients that predict satisfaction among 1,045 strongly satisfied and 671 moderately satisfied Emergency Department (ED) patients. HO-CTA models that explicitly maximize ESS versus the overall percentage of accurate classification (PAC) are contrasted.

View journal article

Using Machine Learning to Model Dose-Response Relationships via ODA: Eliminating Response Variable Baseline Variation by Ipsative Standardization

Paul R. Yarnold & Ariel Linden

Optimal Data Analysis, LLC

A maximum-accuracy machine-learning method for predicting dose of exposure based on distribution of the response variable was recently introduced. Herein we demonstrate the advantages of eliminating baseline variation in the response variable via transformation by ipsative standardization. Using data measuring forearm blood flow responses to intra-arterial administration of Isoproterenol, findings obtained using optimized discriminant analysis and a general estimating equation are compared separately for black and white males (and pooled data) using raw versus ipsatively standardized blood flow data. Findings using raw versus ipsatively standardized forearm blood-flow response data were incongruous. The standardized responses of blacks and whites were indistinguishable through 150 ng/min doses; responses of blacks were elevated at 300 ng/min dose, but at 400 ng/min dose responses regressed to 150 ng/min-dose-levels; while at 400 ng/min dose whites had the greatest response levels observed in the study. Using raw data there was no evidence of inter-method statistical conclusion agreement; baseline variability resulted in failure to statistically confirm numerous inter-dose responses; and the dose-response model yielded moderate predictive accuracy. Using standardized data there was significant evidence of inter-method statistical conclusion agreement; eliminating baseline variability yielded more findings of statistically reliable inter-dose responses; and the dose-response model yielded relatively strong predictive accuracy. This study adds to a growing literature demonstrating that ipsative standardization of the response variable studied in single-case or multiple-observation “repeated measures” designs yields generalizable models that generate the most accurate predictions (normed against chance) that are analytically possible for the sample data.

View journal article

Causality of Adverse Drug Reactions: The Upper-Bound of Arbitrated Expert Agreement for Ratings Obtained by WHO and Naranjo Algorithms

Paul R. Yarnold

Optimal Data Analysis, LLC

As a high-ranking cause of human mortality, adverse drug reactions (ADRs) are the focus of an enormous literature, and optimal statistical methods have proven undaunted by the analysis-challenging geometry of multi-site longitudinal medical data sets. Two broadly-used causality assessment algorithms for identifying ADRs are the Naranjo and World Health Organization (WHO) ADR algorithms. Ratings made using these algorithms haven’t been validated, so the extent to which arbitrated ratings made by independent experts using these algorithms agree is important in assessing the expected upper-bound of inter-rater, inter-method reliability. Using data from India on this issue, UniODA identified a strong to very strong relationship between ratings obtained using WHO and Naranjo algorithms for a sample of N = 200 randomly selected patients. Inter-algorithm disagreement occurred for 15.2% of cases indicated as “Probable” by the Naranjo algorithm, but as “Possible” by the WHO algorithm.

View journal article