# Novometric Theorem Generalized to Unrestricted Class Variables

Paul R. Yarnold

Optimal Data Analysis, LLC

Novometric theory originally consisted of four axioms and applied to designs with a binary class variable and an unrestricted attribute. This note adds a fifth axiom generalizing novometrics to designs involving unrestricted class variables.

View journal article

# Matrix Display of Pairwise Novometric Associations for Ordered Variables

Paul R. Yarnold

Optimal Data Analysis, LLC

Correlation matrices are commonly used to summarize the linear associations for all possible pairings in a set of ordered variables. This paper exports this methodology, creating the conceptual analogue of a correlation matrix in the optimal data analysis (ODA) statistical paradigm.

View journal article

# Novometrics vs. Regression Analysis: Modeling Patient Satisfaction with Care Received in the Emergency Room

Paul R. Yarnold

Optimal Data Analysis, LLC

Ordered dependent (class) variables are ordinarily modeled by Pearson correlation (r) in univariable applications with one ordered independent variable (attribute), and by multiple regression analysis (MRA) in multivariable applications involving more than one attribute. Prior research demonstrated the use of ODA to maximize predictive accuracy of r and MRA models. The present paper demonstrates the use of novometrics, the maximum-accuracy alternative to r and MRA.

View journal article

# Novometrics vs. Regression Analysis: Literacy, and Age and Income, of Ambulatory Geriatric Patients

Paul R. Yarnold

Optimal Data Analysis, LLC

A convenience sample of 293 ambulatory women patients, all older than 65 years of age, were surveyed in a general medicine clinic. Correlation (r), multiple regression analysis (MRA), and novometric analysis were used to model the relationship of scores (even integers) on the TOFHLA literacy measure (the dependent or class variable) with age (recorded to two significant digits to the right of the decimal) and income (measured as 1 to 8, inclusive, integer annual increments of \$10,000). Regression- and novometric-based findings are contrasted.

View journal article

# Novometrics vs. Multiple Regression Analysis: Age and Clinical Measures of PCP Survivors

Paul R. Yarnold & Charles L. Bennett

Optimal Data Analysis, LLC

The application of ODA to maximize the predictive accuracy achieved by a linear model that was developed using multiple regression analysis (MRA) was previously considered. The present paper illustrates the maximum-accuracy alternative to MRA.

View journal article

# Novometrics vs. Correlation: Age and Clinical Measures of PCP Survivors

Paul R. Yarnold & Charles L. Bennett

Optimal Data Analysis, LLC

The use of ODA to maximize the predictive accuracy of linear Pearson correlation (r) models was previously considered. The present paper discusses and illustrates the maximum-accuracy alternative to r.

View journal article

# Novometric Analysis with Ordered Class Variables: The Optimal Alternative to Linear Regression Analysis

Paul R. Yarnold & Ariel Linden

Optimal Data Analysis, LLC

Employed to model an ordered dependent (class) variable, Pearson correlation (r) is used in univariable applications featuring one ordered independent variable (attribute), and multiple regression analysis (MRA) is utilized in multivariable applications featuring two or more attributes. Prior research demonstrated how to maximize the predictive accuracy of univariable and multivariable regression models vis-à-vis an ODA-based procedure. The present paper instead demonstrates optimal alternatives to r and MRA.

View journal article

# How Many EO-CTA Models Exist in My Sample and Which is the Best Model?

Paul R. Yarnold

Optimal Data Analysis, LLC

As concerns the existence of statistically reliable enumerated-optimal classification tree analysis (EO-CTA) model(s) for a given application, possible alternative analytic outcomes are: no EO-CTA model exists; one model exists; or a descendant family (DF) that consists of two or more models exists. Models in a DF maximize ESS for unique partitions of the sample, and the model with the lowest observed D statistic is the globally-optimal CTA (GO-CTA) model for the application. The brute-force method of identifying a DF involves obtaining an initial EO-CTA model without specifying minimum end¬point sample size, then applying the minimum denominator selection algorithm (MDSA) to the initial model. A more efficient methodology for obtaining the GO-CTA model involves including only the attribute subset identified using structural decomposition analysis (SDA). The DF for the SDA attribute subset differs from the DF identified for the entire attribute set because the DF is data-specific. These methods are illustrated for an application using rated aspects of nursing and physician care to discriminate 1,045 very satisfied vs. 671 satisfied Emergency Department (ED) patients.

View journal article

# Pruning CTA Models to Maximize PAC

Paul R. Yarnold

Optimal Data Analysis, LLC

In CTA weighting by prior odds is used if a model is sought to maximize ESS, which is explicitly optimized by a pruning algorithm that deconstructs a fully-grown model into all nested sub-branches and then reassembles all possible combinations of sub-branches to identify the configuration with greatest ESS. In contrast, unit-weighting is used if a CTA model is sought to maximize PAC, explicitly optimized using the pruning algorithm to reassemble all possible sub-branch combinations and identify the configuration with greatest PAC.

View journal article

# Identifying the Descendant Family of HO-CTA Models by using the Minimum Denominator Selection Algorithm: Maximizing ESS versus PAC

Paul R. Yarnold

Optimal Data Analysis, LLC

Usually it is possible to identify numerous different hierarchically-optimal classification tree analysis (HO-CTA) models in applications having an adequate sample size and involving multiple attributes. The models differ in complexity—defined as the number of endpoints representing distinct patient strata: the fewer the number of strata, the more parsimonious the model. The different models also vary in normed predictive accuracy—defined as effect strength for sensitivity (ESS): 0 represents the predictive accuracy expected by chance for the application; 100 represents errorless prediction. The distance of each model from a theoretically ideal solution for the application—defined as a model having perfect accuracy and minimum complexity, is computed as a D statistic. The underlying descendant family of models (including the globally-optimal model with the lowest D possible by an HO-CTA model for the application) is identified by first obtaining a model without specifying minimum endpoint sample size, and then applying the minimum denominator selection algorithm (MDSA). These methods are illustrated in an application seeking to identify aspects of nursing care delivered to patients that predict satisfaction among 1,045 strongly satisfied and 671 moderately satisfied Emergency Department (ED) patients. HO-CTA models that explicitly maximize ESS versus the overall percentage of accurate classification (PAC) are contrasted.

View journal article

# Using Machine Learning to Model Dose-Response Relationships via ODA: Eliminating Response Variable Baseline Variation by Ipsative Standardization

Paul R. Yarnold & Ariel Linden

Optimal Data Analysis, LLC

A maximum-accuracy machine-learning method for predicting dose of exposure based on distribution of the response variable was recently introduced. Herein we demonstrate the advantages of eliminating baseline variation in the response variable via transformation by ipsative standardization. Using data measuring forearm blood flow responses to intra-arterial administration of Isoproterenol, findings obtained using optimized discriminant analysis and a general estimating equation are compared separately for black and white males (and pooled data) using raw versus ipsatively standardized blood flow data. Findings using raw versus ipsatively standardized forearm blood-flow response data were incongruous. The standardized responses of blacks and whites were indistinguishable through 150 ng/min doses; responses of blacks were elevated at 300 ng/min dose, but at 400 ng/min dose responses regressed to 150 ng/min-dose-levels; while at 400 ng/min dose whites had the greatest response levels observed in the study. Using raw data there was no evidence of inter-method statistical conclusion agreement; baseline variability resulted in failure to statistically confirm numerous inter-dose responses; and the dose-response model yielded moderate predictive accuracy. Using standardized data there was significant evidence of inter-method statistical conclusion agreement; eliminating baseline variability yielded more findings of statistically reliable inter-dose responses; and the dose-response model yielded relatively strong predictive accuracy. This study adds to a growing literature demonstrating that ipsative standardization of the response variable studied in single-case or multiple-observation “repeated measures” designs yields generalizable models that generate the most accurate predictions (normed against chance) that are analytically possible for the sample data.

View journal article

# Causality of Adverse Drug Reactions: The Upper-Bound of Arbitrated Expert Agreement for Ratings Obtained by WHO and Naranjo Algorithms

Paul R. Yarnold

Optimal Data Analysis, LLC

As a high-ranking cause of human mortality, adverse drug reactions (ADRs) are the focus of an enormous literature, and optimal statistical methods have proven undaunted by the analysis-challenging geometry of multi-site longitudinal medical data sets. Two broadly-used causality assessment algorithms for identifying ADRs are the Naranjo and World Health Organization (WHO) ADR algorithms. Ratings made using these algorithms haven’t been validated, so the extent to which arbitrated ratings made by independent experts using these algorithms agree is important in assessing the expected upper-bound of inter-rater, inter-method reliability. Using data from India on this issue, UniODA identified a strong to very strong relationship between ratings obtained using WHO and Naranjo algorithms for a sample of N = 200 randomly selected patients. Inter-algorithm disagreement occurred for 15.2% of cases indicated as “Probable” by the Naranjo algorithm, but as “Possible” by the WHO algorithm.

View journal article

# Ascertaining Intervention Efficacy

Paul R. Yarnold

Optimal Data Analysis, LLC

UniODA is used to study the effect of interruptions on the course of behavior normally seen in longitudinal (temporal) series, for cases or for groups, in applications such as modeling of mortality rates after exposure to environmental toxins, evaluating symptom reduction in chronic disease after human or artificial therapy, or assessing the validity of efficacy claims (regarding an outcome) associated with change in public policy. This paper uses UniODA to assess the immediate short-term longevity of efficacy (if any) of back-to-back interventions (advertisements published on a hobby shop webpage) with respect to two serial outcomes each assessed as counts: the daily number of webpage visitors, and of page views.

View journal article

# Maximizing Overall Percentage Accuracy in Classification: Discriminating Study Groups in the National Pressure Ulcer Long-Term Care Study (NPULS)

Paul R. Yarnold

Optimal Data Analysis, LLC

UniODA may be used to identify two different types of (weighted) maximum-accuracy models. First, ODA can identify models that explicitly maximize overall percentage accuracy in classification or PAC—that is, the percentage of the total sample that is correctly classified by the model. Second, ODA can identify models that explicitly maximize the predictive accuracy of the model normed against chance using the effect strength for sensitivity (ESS) statistic, that is both chance-corrected (0 = the predictive accuracy expected by chance for the application) and maximum-corrected (100 = perfect, errorless classification). Because comparatively little is known about optimal models that maximize PAC, this research note initiates a literature on the matter. The present exposition involves assessing if clinical and demographic factors can be discriminated on the basis of study group using a UniODA model that explicitly maximizes PAC.

View journal article

# ODA vs. Chi-Square: Describing Baseline Data from the National Pressure Ulcer Long-Term Care Study (NPULS)

Paul R. Yarnold

Optimal Data Analysis, LLC

Chi-square analysis is often used to analyze data in contingency tables created by crossing two categorical variables, with at least one having three or more categories. Researchers report the associated omnibus (overall) p value to indicate the statistical reliability (not the strength) of the association between the variables. A statistically significant omnibus p value indicates two or more categories differ, but the exact structure of the inter-category difference(s) isn’t explicit. Pairwise comparisons are needed to reveal the precise effect, but in practice this may be substituted for a non-statistical “eyeball analysis-based” summary of the data. In contrast, ODA models provide exact p values and an index of effect strength that is normed against chance and can be used to directly compare the classification accuracy achieved by alternative models. Furthermore, ODA models explicitly identify the structure of the omnibus effect, and an efficient optimal pairwise comparison methodology is used to ensure the statistical integrity of the model. These methods are illustrated for a sample of N = 2,420 adults at risk of developing a pressure ulcer.

View journal article

# Pairwise Comparisons using UniODA vs. Not Log-Linear Model: Ethnic Group and Schooling in the 1980 Census

Paul R. Yarnold

Optimal Data Analysis, LLC

Data are from a contingency table used to determine the relationship between years of schooling arbitrarily parsed into six ordered categories, and ethnic group measured on a categorical variable with seven levels. Although ordinal data are inappropriate for analysis via chi-square-based methods, log-linear analysis was used to investigate association between years of schooling and ethnic group. Because the independence model didn’t provide an acceptable representation of the data, it is clear that some form of association underlies the data. A three-dimensional log-linear-based solution was proposed: “In terms of the scores in the first dimension only, whites are closest to Chinese; blacks are closest to Vietnamese; and Hispanics are extreme outliers. Either the distance matrix…or a two-dimensional plot…can be used to locate the groups or measure distances in terms of educational distributions” (p. 103). All possible pairwise comparisons were conducted between ethnic groups using UniODA, and the results revealed a single dimension (years of schooling) perfectly described the statistical conclusions reached for 20 of 21 analyses. The single inconsistent finding had an associated miniscule effect size.

View journal article

# UniODA vs. Not Log-Linear Model: The Relationship of Mental Health Status and Socioeconomic Status

Paul R. Yarnold

Optimal Data Analysis, LLC

Data are from a classic 4 x 6 contingency table used to determine the relationship (if any) between mental health status measured using four ordered categories, and socioeconomic status (SES) measured using six ordered categories. Although ordinal data are inappropriate for analysis via chi-square-based methods, log-linear analysis was used to investigate association between mental health and SES. A variety of legacy measures of association indicated “moderate” association, but differed in terms of their use of standardization, statistical reliability testing, and identification of the underlying direction (positive, negative, or non-linear) and strength of the relationship. UniODA overcomes these legacy shortcomings, as illustrated in three “standard” analysis modes: exploratory (a non-linear model was identified having a very weak effect, ESS = 6.4), or confirmatory hypothesizing either a positive (ESS = 5.6) or a negative (ESS = -3.0) relationship.

View journal article

# Determining Jackknife ESS for a CTA Model with Chaotic Instability

Paul R. Yarnold

Optimal Data Analysis, LLC

CTA models are developed using one of three different strategies as concerns “leave-one-out” (LOO) analysis: (a) ignore LOO analysis; (b) only include attributes having identical ESS in training and LOO analysis in the model (the “LOO stable” criterion); or (c) only include attributes having the highest ESS in LOO analysis in the model (the “LOO p < 0.05” criterion). Software for performing CTA reports ESS for training but not for LOO analysis, so a recent article demonstrated the use of UniODA to assess ESS in LOO for CTA models with well-organized instability propagation—for example, restricted to a pair of endpoints for a node, or invalidating the CTA model for statistically unreliable replication. The present article illustrates assessing LOO ESS under chaotic conditions in which instability propagates down and across the left- and the right-hand sides of the CTA model.

View journal article

# Using UniODA to Determine the ESS of a CTA Model in LOO Analysis

Paul R. Yarnold

Optimal Data Analysis, LLC

CTA models may be constructed using three different strategies with respect to consideration of “leave-one-out” (LOO) jackknife validity analysis: (1) ignore LOO validity analysis; (2) only include attributes yielding the same ESS in training and LOO analysis in the model (the “LOO stable” criterion); or (3) include attributes with highest ESS in LOO analysis in the model (“LOO p < 0.05” criterion). CTA software produces the confusion table for a CTA model for training analysis, but not for LOO analysis. This article shows how to use UniODA to determine the ESS of CTA models in LOO analysis. Exposition clearly demonstrates that failing to account for model cross-generalizability performance in classification analysis can produce models with good training performance and chance (or worse) reproducibility.

View journal article