Optimizing Suboptimal Classification Trees: S-PLUS® Propensity Score Model for Adjusted Comparison of Hospitalized vs. Ambulatory Patients with Community-Acquired Pneumonia

Paul R. Yarnold

Optimal Data Analysis, LLC

Pruning to maximize model accuracy (requiring simple hand computation) is applied to a classification tree model developed via S-PLUS to create propensity scores to improve causal inference in comparing hospitalized vs. ambulatory patients with community-acquired pneumonia. Research reported herein constitutes a thought-provoking example of a striking misalliance between forward analytic thinking and vestige statistical tools—a condition that dominates the empirical literature today. Modifications of ubiquitous methodological practices are suggested.

View journal article

The Structure of Perfect Optimal Models with a Two-Category Class Variable and Four or Fewer Endpoints

Paul R. Yarnold

Optimal Data Analysis, LLC

An optimal model has a specific geometric configuration defined by the number of attributes (“independent variables”—schematically illustrated using circles) and endpoints (defined by response on attribute—indicated by rectangles). Branches direct attributes to endpoints via an if/then/else-based decision rule identified by the (ODA/CTA/novometric) algorithm and operationalized vis-à-vis numerical thresholds or categorical rosters which explicitly maximize (weighted) classification accuracy. In hopes of aiding in the visualization, pursuit and discovery of perfectly accurate statistical classification models, this paper presents schematic diagrams which correspond to combinations of number of attributes and endpoints that are possible for a range of optimal models commonly reported.

View journal article

Value-Added by ODA vs. Chi-Square

Paul R. Yarnold

Optimal Data Analysis, LLC

Beyond identifying the most accurate classification model which exists for the sample, and estimating cross-generalizability vis-à-vis jackknife, hold-out and/or other validity methods, ODA provides the exact one- or two-tailed P-value, the sensitivity and predictive value for each category of the class variable, and the effect strength corrected for chance.

View journal article

Maximum-Precision Markov Transition Table: Successive Daily Change in Closing Price of a Utility Stock

Paul R. Yarnold

Optimal Data Analysis, LLC

Research seeking to increase the accuracy of traditional Markov analysis-based models, which assess the outcome (class) variable as a two-category variable, studies the use of over-time weighting schemes. This paper demonstrates how to maximize precision of the class variable by using ODA to weight each individual “observation” (event) in the transition table by its corresponding exact absolute change-in-value.

View journal article

Comparing Exact Discrete 95% CIs for Model vs. Chance ESS to Evaluate Statistical Significance

Paul R. Yarnold

Optimal Data Analysis, LLC

Satisfaction ratings (1=very dissatisfied; 2=somewhat dissatisfied; 3=neutral; 4=somewhat satisfied; 5=very satisfied) provided by 4,583 hospital patients in three successive cohorts (two consecutive 3-month-long base¬line cohorts and one 3-month-long post-intervention cohort) were compared to evaluate a program which was designed to increase patient-rated satisfaction with in-hospital received care (Table 1).

View journal article

Optimal Analyses for Cohort Tables

Paul R. Yarnold

Optimal Data Analysis, LLC

Cross-classification tables may be created for one or more “cohorts”— groups of observations defined by a common event such as the year of one’s birth, graduation, employment, marriage, disease diagnosis or incarceration—and assessed at two or more points in time on one or more variables reflecting the substantive focus of the study. This article demonstrates exploratory maximum-accuracy evaluation of cohort, aging, and time effects for a standard cohort table.

View journal article

Using ODA to Confirm a First Order Markov Steady State Process

Paul R. Yarnold

Optimal Data Analysis, LLC

Sufficiently iterated over time periods a first order Markovian change process defined by a constant transition matrix yields a steady state. Consecutive transition matrices are compared by Goodman’s chi-square test to assess if a steady state has been achieved. This note demonstrates the analogous use of ODA to assess if such transition matrices differ.

View journal article

Comparative Accuracy of a Diagnostic Index Modeled Using (Optimized) Regression vs. Novometrics

Ariel Linden & Paul R. Yarnold

Linden Consulting Group, LLC & Optimal Data Analysis, LLC

Diagnostic screening tests are used to predict an individual’s graduated disease status which is measured on an ordered scale assessing disease progression (severity of illness). Maximizing the predictive accuracy of the diagnostic or screening test is paramount to correctly identifying an individual’s actual score along the ordered continuum. The present study compares two approaches for mapping a statistical model to a diagnostic index in order to make accurate outcome predictions for individuals. The application involves a dataset composed of multiple biomedical voice measurements for 42 individuals with early-stage Parkinson’s disease, who completed a six-month trial of a device for remote symptom progression telemonitoring. For 16 voice measures, each treated as a main effect, ordinary least-squares regression is used to predict baseline motor impairment component score. ODA is used to maximize accuracy of the regression model when it is mapped to the diagnostic index, and results are compared with accuracy achieved by the novometric solution.

View journal article

Identifying Maximum-Accuracy Cut-Points for Diagnostic Indexes via ODA

Ariel Linden & Paul R. Yarnold

Linden Consulting Group, LLC & Optimal Data Analysis, LLC

Maximizing the discriminatory accuracy of a diagnostic or screening test is paramount to correctly identifying individuals with vs. without the disease or disease marker. In this paper we demonstrate the use of ODA to identify the optimal cut-point which best discriminates between those with vs. without the disease (or marker) under study, for any diagnostic test. We illustrate this methodology using a dataset composed of a range of repeated biomedical voice measurements from 31 people, 23 with Parkinson’s disease (PD). A logistic regression model was used to estimate the probability that each observation was from a person with vs. without PD as a function of 22 voice measurement variables, entered in the model as main effects only. Five different methods for computing a diagnostic cut-point on estimated probability are compared.

View journal article