Using Machine Learning to Model Dose-Response Relationships via ODA: Eliminating Response Variable Baseline Variation by Ipsative Standardization

Paul R. Yarnold & Ariel Linden

Optimal Data Analysis, LLC

A maximum-accuracy machine-learning method for predicting dose of exposure based on distribution of the response variable was recently introduced. Herein we demonstrate the advantages of eliminating baseline variation in the response variable via transformation by ipsative standardization. Using data measuring forearm blood flow responses to intra-arterial administration of Isoproterenol, findings obtained using optimized discriminant analysis and a general estimating equation are compared separately for black and white males (and pooled data) using raw versus ipsatively standardized blood flow data. Findings using raw versus ipsatively standardized forearm blood-flow response data were incongruous. The standardized responses of blacks and whites were indistinguishable through 150 ng/min doses; responses of blacks were elevated at 300 ng/min dose, but at 400 ng/min dose responses regressed to 150 ng/min-dose-levels; while at 400 ng/min dose whites had the greatest response levels observed in the study. Using raw data there was no evidence of inter-method statistical conclusion agreement; baseline variability resulted in failure to statistically confirm numerous inter-dose responses; and the dose-response model yielded moderate predictive accuracy. Using standardized data there was significant evidence of inter-method statistical conclusion agreement; eliminating baseline variability yielded more findings of statistically reliable inter-dose responses; and the dose-response model yielded relatively strong predictive accuracy. This study adds to a growing literature demonstrating that ipsative standardization of the response variable studied in single-case or multiple-observation “repeated measures” designs yields generalizable models that generate the most accurate predictions (normed against chance) that are analytically possible for the sample data.

View journal article

Causality of Adverse Drug Reactions: The Upper-Bound of Arbitrated Expert Agreement for Ratings Obtained by WHO and Naranjo Algorithms

Paul R. Yarnold

Optimal Data Analysis, LLC

As a high-ranking cause of human mortality, adverse drug reactions (ADRs) are the focus of an enormous literature, and optimal statistical methods have proven undaunted by the analysis-challenging geometry of multi-site longitudinal medical data sets. Two broadly-used causality assessment algorithms for identifying ADRs are the Naranjo and World Health Organization (WHO) ADR algorithms. Ratings made using these algorithms haven’t been validated, so the extent to which arbitrated ratings made by independent experts using these algorithms agree is important in assessing the expected upper-bound of inter-rater, inter-method reliability. Using data from India on this issue, UniODA identified a strong to very strong relationship between ratings obtained using WHO and Naranjo algorithms for a sample of N = 200 randomly selected patients. Inter-algorithm disagreement occurred for 15.2% of cases indicated as “Probable” by the Naranjo algorithm, but as “Possible” by the WHO algorithm.

View journal article

Ascertaining Intervention Efficacy

Paul R. Yarnold

Optimal Data Analysis, LLC

UniODA is used to study the effect of interruptions on the course of behavior normally seen in longitudinal (temporal) series, for cases or for groups, in applications such as modeling of mortality rates after exposure to environmental toxins, evaluating symptom reduction in chronic disease after human or artificial therapy, or assessing the validity of efficacy claims (regarding an outcome) associated with change in public policy. This paper uses UniODA to assess the immediate short-term longevity of efficacy (if any) of back-to-back interventions (advertisements published on a hobby shop webpage) with respect to two serial outcomes each assessed as counts: the daily number of webpage visitors, and of page views.

View journal article

Maximizing Overall Percentage Accuracy in Classification: Discriminating Study Groups in the National Pressure Ulcer Long-Term Care Study (NPULS)

Paul R. Yarnold

Optimal Data Analysis, LLC

UniODA may be used to identify two different types of (weighted) maximum-accuracy models. First, ODA can identify models that explicitly maximize overall percentage accuracy in classification or PAC—that is, the percentage of the total sample that is correctly classified by the model. Second, ODA can identify models that explicitly maximize the predictive accuracy of the model normed against chance using the effect strength for sensitivity (ESS) statistic, that is both chance-corrected (0 = the predictive accuracy expected by chance for the application) and maximum-corrected (100 = perfect, errorless classification). Because comparatively little is known about optimal models that maximize PAC, this research note initiates a literature on the matter. The present exposition involves assessing if clinical and demographic factors can be discriminated on the basis of study group using a UniODA model that explicitly maximizes PAC.

View journal article

ODA vs. Chi-Square: Describing Baseline Data from the National Pressure Ulcer Long-Term Care Study (NPULS)

Paul R. Yarnold

Optimal Data Analysis, LLC

Chi-square analysis is often used to analyze data in contingency tables created by crossing two categorical variables, with at least one having three or more categories. Researchers report the associated omnibus (overall) p value to indicate the statistical reliability (not the strength) of the association between the variables. A statistically significant omnibus p value indicates two or more categories differ, but the exact structure of the inter-category difference(s) isn’t explicit. Pairwise comparisons are needed to reveal the precise effect, but in practice this may be substituted for a non-statistical “eyeball analysis-based” summary of the data. In contrast, ODA models provide exact p values and an index of effect strength that is normed against chance and can be used to directly compare the classification accuracy achieved by alternative models. Furthermore, ODA models explicitly identify the structure of the omnibus effect, and an efficient optimal pairwise comparison methodology is used to ensure the statistical integrity of the model. These methods are illustrated for a sample of N = 2,420 adults at risk of developing a pressure ulcer.

View journal article

Pairwise Comparisons using UniODA vs. Not Log-Linear Model: Ethnic Group and Schooling in the 1980 Census

Paul R. Yarnold

Optimal Data Analysis, LLC

Data are from a contingency table used to determine the relationship between years of schooling arbitrarily parsed into six ordered categories, and ethnic group measured on a categorical variable with seven levels. Although ordinal data are inappropriate for analysis via chi-square-based methods, log-linear analysis was used to investigate association between years of schooling and ethnic group. Because the independence model didn’t provide an acceptable representation of the data, it is clear that some form of association underlies the data. A three-dimensional log-linear-based solution was proposed: “In terms of the scores in the first dimension only, whites are closest to Chinese; blacks are closest to Vietnamese; and Hispanics are extreme outliers. Either the distance matrix…or a two-dimensional plot…can be used to locate the groups or measure distances in terms of educational distributions” (p. 103). All possible pairwise comparisons were conducted between ethnic groups using UniODA, and the results revealed a single dimension (years of schooling) perfectly described the statistical conclusions reached for 20 of 21 analyses. The single inconsistent finding had an associated miniscule effect size.

View journal article

UniODA vs. Not Log-Linear Model: The Relationship of Mental Health Status and Socioeconomic Status

Paul R. Yarnold

Optimal Data Analysis, LLC

Data are from a classic 4 x 6 contingency table used to determine the relationship (if any) between mental health status measured using four ordered categories, and socioeconomic status (SES) measured using six ordered categories. Although ordinal data are inappropriate for analysis via chi-square-based methods, log-linear analysis was used to investigate association between mental health and SES. A variety of legacy measures of association indicated “moderate” association, but differed in terms of their use of standardization, statistical reliability testing, and identification of the underlying direction (positive, negative, or non-linear) and strength of the relationship. UniODA overcomes these legacy shortcomings, as illustrated in three “standard” analysis modes: exploratory (a non-linear model was identified having a very weak effect, ESS = 6.4), or confirmatory hypothesizing either a positive (ESS = 5.6) or a negative (ESS = -3.0) relationship.

View journal article