Randomized Blocks Designs: Omnibus vs. Pairwise Comparison, Fixed vs. Relative Optimal Discriminant Threshold, and Raw vs. Ipsative z-Score Measures

Paul R. Yarnold & Ariel Linden

Optimal Data Analysis, LLC & Linden Consulting Group, LLC

This study extends recent research assessing the use of relative thresholds in matched-pairs designs, for a randomized blocks design in which four treatments are randomly assigned to blood samples drawn from each of eight people (each person treated as a block). Both raw and ipsatively standardized plasma clotting times are compared between treatments.

View journal article

Using Fixed and Relative Optimal Discriminant Thresholds in Randomized Blocks (Matched-Pairs) Designs

Paul R. Yarnold & Ariel Linden

Optimal Data Analysis, LLC & Linden Consulting Group, LLC

Optimal discriminant analysis (ODA) is often used to compare values of one (or more) attributes between two (or more) groups of observations with respect to a fixed discriminant threshold that maximizes accuracy normed against chance for the sample. However, a recent study using a matched-pairs design found that using a relative discriminant threshold to assess an (exploratory or confirmatory) a priori hypothesis separately for each pair of observations can identify inter-group differences which otherwise are too subtle to be identified by using fixed thresholds. The present investigation replicates the finding regarding efficacy of relative thresholds for matched-pairs designs, this time for a randomized blocks design consisting of two patient groups (one group assigned to take an antidepressant drug, the other group assigned to take a placebo) between which a numerical measure of depression was compared. Several recommendations are made concerning use of improved modern optimal statistical alternatives for this class of experimental design.

View journal article

ODA vs. t-Test: Lysozyme Levels in the Gastric Juice of Patients with Peptic Ulcer vs. Normal Controls

Paul R. Yarnold

Optimal Data Analysis, LLC

Lysozyme levels in gastric juice of peptic ulcer patients were compared against normal controls by t-test, finding p<0.05. Because standard deviations differed by a factor of two between groups, and were proportional to the means, analysis of natural logarithms was instead deemed appropriate: the resulting t-test wasn’t statistically significant. Analyzed by ODA no statistically significant between-group difference emerged, and results obtained for raw data and for natural logarithms were identical because ODA results (i.e., p and ESS) are invariant over all monotonic transformations of the data.

View journal article

Regression vs. Novometric-Based Assessment of Inter-Examiner Reliability

Paul R. Yarnold

Optimal Data Analysis, LLC

Four examiners independently recorded the DMFS (decayed, missing, filled surfaces) scores of ten patients. Inter-examiner correspondence of DMFS scores was evaluated using Pearson correlation and novometric analysis. Whereas essentially perfect correlation models were unable to accurately predict DMFS scores in training analysis, novometric models were consistently perfect in both training and reproducibility analysis.

View journal article

Fixed vs. Relative Optimal Discriminant Thresholds: Pairwise Comparisons of Raters’ Ratings for a Sample

Paul R. Yarnold

Optimal Data Analysis, LLC

Foundational to the ODA algorithm when used with an ordered attribute is the identification of the optimal threshold—the specific cutpoint that yields the most accurate (weighted) classification solution for a sample of observations. ODA models involving a single optimal threshold will henceforth be called “fixed-threshold” models. This note proposes a new “relative-threshold” ODA model for an inter-examiner reliability study in which four examiners independently rate teeth condition for a sample of ten patients: “An important inferential question is whether the rater effects differ significantly from one another” (p. 19). In the original study, analysis of variance showed rater C assigned the greatest mean rating across patients: “The inference is therefore drawn that differential measurement bias exists (i.e., the k examiners differ systematically from one another in their mean levels of measurement)” (pp. 20-21). ODA was used to compare the entire response distribution (not only means) between raters. A fixed-threshold model identified no effects. A relative-threshold model tested the hypothesis that, for each observation in the sample considered separately, the rating by rater X will be less than (or equal to) the rating by rater Y. Analysis showed that the distribution of ratings made by rater C was nearly perfectly greater than corresponding (non-discriminable) ratings made by raters A, B, and D. This finding hints of possible development of optimal analogues of multidimensional scaling and facet theory methodologies.

View journal article

Logistic Discriminant Analysis and Structural Equation Modeling Both Identify Effects in Random Data

Ariel Linden, Fred B. Bryant & Paul R. Yarndol

Linden Consulting Group, LLC, Loyola University Chicago & Optimal Data Analysis, LLC

Recent research compared the ability of various classification algorithms [logistic regression (LR), random forests (RF), support vector machines (SVM), boosted regression (BR), multi-layer perceptron neural net model (MLP), and classification tree analysis (CTA)] to correctly fail to identify a relationship between a binary class (dependent) variable and ten randomly generated attributes (covariates): only CTA failed to find a model. We use the same ten-variable N=1,000 dataset to assess training classification accuracy of models developed by logistic discriminant analysis (LDA), generalized structural equation modelling (GSEM), and robust diagonally-weighted least-squares (DWLS) SEM for binary outcomes. Except for CTA, all machine-learning algorithms assessed thus far have identified training effects in random data.

View journal article

Multi-Layer Perceptron Neural Net Model Identifies Effect in Random Data

Ariel Linden & Paul R. Yarnold

Linden Consulting Group, LLC & Optimal Data Analysis, LLC

Prior research contrasted the ability of different classification algorithms [logistic regression (LR), random forests (RF), boosted regression (BR), support vector machines (SVM), classification tree analysis (CTA)] to correctly fail to identify a relationship between a binary class (dependent) variable and ten randomly generated attributes (covariates): only CTA found no relationship. In this paper, using the same ten-variable N=1,000 dataset, a Weka multi-layer perceptron (MLP) neural net model using its default tuning parameters yielded (area under the curve) AUC=0.724 in training analysis, and AUC=0.507 in ten-fold cross-validation. With the exception of CTA, all machine-learning algorithms assessed thus far have identified training effects in completely random data.

View journal article

Optimizing Suboptimal Classification Trees: Matlab® CART Model Predicting Probability of Lower Limb Prosthesis User’s Functional Potential

Paul R. Yarnold & Ariel Linden

Optimal Data Analysis, LLC & Linden Consulting Group, LLC

After any algorithm which controls the growth of a classification tree model has completed, the resulting model must be pruned in order to explicitly maximize predictive accuracy normed against chance. This article illustrates manually-conducted maximum-accuracy pruning of a classification and regression tree (CART) model that was developed to predict the functional capacity of lower limb prosthesis users.

View journal article

Regression vs. Novometric Analysis Predicting Income Based on Education

Paul R. Yarnold

Optimal Data Analysis, LLC

This study compares linear regression vs. novometric models of the association of education and income for a sample of 32 observations. Regression analysis identified a relatively strong effect (R-squared=56.4), but only 25% of point predictions fell within a 20% band of actual income. Novometric analysis identified a strong effect (ESS=81.7%) which was stable in jackknife validity analysis: the model correctly classified 91.7% of observations earning income less than $12,405, and 90.0% of those earning greater income. For people with an income which is less than the optimal threshold, and for those earning greater income, factors other than the number of years of education influenced earned income.

View journal article

Effect of Sample Size on Discovery of Relationships in Random Data by Classification Algorithms

Ariel Linden & Paul R. Yarnold

Linden Consulting Group, LLC & Optimal Data Analysis, LLC

In a recent paper, we assessed the ability of several classification algorithms (logistic regression, random forests, boosted regression, support vector machines, and classification tree analysis [CTA]) to correctly not identify a relationship between the dependent variable and ten covariates generated completely at random. Only classification tree analysis correctly observed that no relationship existed. In this study, we examine whether various randomly derived subsets of the original N=1000 dataset change the ability of these models to correctly observe that no relationship exists. The randomly drawn samples were 250 and 500 observations. We further test the hold-out validity of these models by applying the generated model’s logic onto the remaining sample and computing the area under the receiver operator’s characteristics curve (AUC). Our results indicate that limiting the sample size has no effect on whether classification algorithms correctly determine that a relationship does not exist between variables in randomly generated data. Only CTA consistently identified that the data were random.

View journal article

ODA vs. χ2, r, and τ: Trauma Exposure in Childhood and Duration of Participation in Eating-Disorder Treatment Program

Paul R. Yarnold

Optimal Data Analysis, LLC

This note illustrates the disorder and confusion attributable to analytic ethos whereby a smorgasbord of different statistical tests are used to test identical or parallel statistical hypotheses. Herein four classic methods are used for an application with a binary class (dependent) variable and an ordered attribute (independent variable) measured using a five-point scale. Legacy methods reach different conclusions—which is correct? In absolute contrast, for a given sample and hypothesis novometric analysis identifies every statistically viable model (models vary as functions of precision and complexity) which reproducibly maximizes the predictive accuracy for the sample.

View journal article

Some Machine Learning Algorithms Find Relationships Between Variables When None Exist — CTA Doesn’t

Ariel Linden & Paul R. Yarnold

Linden Consulting Group, LLC & Optimal Data Analysis, LLC

Automated machine learning algorithms are widely promoted as the best approach for estimating propensity scores, because these methods detect patterns in the data which manual efforts fail to identify. If classification algorithms are indeed ideal for identifying relationships between treatment group participation and covariates which predict participation, then it stands to reason that these algorithms would also be unable to find relationships when none exist (i.e., covariates do not predict treatment group assignment). Accordingly, we compare the predictive accuracy of maximum-accuracy classification tree analysis (CTA) vs. classification algorithms most commonly used to obtain the propensity score (logistic regression, random forests, boosted regression, and support vector machines). However, here we use an artificial dataset in which ten continuous covariates are randomly generated and by design have no correlation with the binary dependent variable (i.e., treatment assignment). Among all of the algorithms tested, only CTA correctly failed to discriminate between treatment and control groups based on the covariates. These results lend further support to the use of CTA for generating propensity scores as an alternative to other common approaches which are currently in favor.

View journal article

Weighted Optimal Markov Model of a Single Outcome: Ipsative Standardization of Ordinal Ratings is Unnecessary

Paul R. Yarnold

Optimal Data Analysis, LLC

This note empirically compares the use of raw vs. ipsatively standardized variables in optimal weighted Markov analysis involving a series for a single outcome—presently, ratings of sleep difficulties for an individual. Findings indicate that the raw score and ipsatively standardized ordinal ratings yield equivalent results in such designs.

View journal article

More On: “Optimizing Suboptimal Classification Trees: S-PLUS® Propensity Score Model for Adjusted Comparison of Hospitalized vs. Ambulatory Patients with Community-Acquired Pneumonia”

Paul R. Yarnold

Optimal Data Analysis, LLC

A recent article optimized ESS of a suboptimal classification tree model that discriminated hospitalized vs. ambulatory patients with community acquired pneumonia (CAP). This note suggests possible alternatives for two original attributes as a means of increasing model accuracy: patient disease-specific knowledge vs. “college education”, and patient-specific functional status and social support vs. “living arrangement”.

View journal article

Optimizing Suboptimal Classification Trees: S-PLUS® Propensity Score Model for Adjusted Comparison of Hospitalized vs. Ambulatory Patients with Community-Acquired Pneumonia

Paul R. Yarnold

Optimal Data Analysis, LLC

Pruning to maximize model accuracy (requiring simple hand computation) is applied to a classification tree model developed via S-PLUS to create propensity scores to improve causal inference in comparing hospitalized vs. ambulatory patients with community-acquired pneumonia. Research reported herein constitutes a thought-provoking example of a striking misalliance between forward analytic thinking and vestige statistical tools—a condition that dominates the empirical literature today. Modifications of ubiquitous methodological practices are suggested.

View journal article