The Loyola Experience (1993-2009): Optimal Data Analysis in the Department of Psychology

The Loyola Experience (1993-2009): Optimal Data Analysis in the Department of Psychology

Fred B. Bryant, Ph.D.

Loyola University Chicago

This article traces the origins and development of the use of optimal data analysis (ODA) within the Department of Psychology at Loyola University Chicago over the past 17 years. An initial set of ODA-based articles by Loyola faculty laid the groundwork for a sustained upsurge in the use of ODA among graduate students which has lasted for more than a decade and a half. These student projects subsequently fueled an increase in ODA-based publications by other Loyola Psychology faculty, who directly supervised the various student projects. Thus, ODA initially trickled down from faculty to students, but later grew up in the opposite direction. The most frequent use of ODA in Loyola’s Psychology Department has been to conduct classification tree analysis, with less common uses of ODA including optimal discriminant analysis and the iterative structural decomposition of transition tables. As more Loyola Psychology graduate students find academic jobs and continue using ODA in their research, we expect that they will replicate the Loyola experience in these new academic settings.

View journal article

Optimal Data Analysis: A General Statistical Analysis Paradigm

Optimal Data Analysis: A General Statistical Analysis Paradigm

Paul R. Yarnold, Ph.D., and Robert C. Soltysik, M.S.

Optimal Data Analysis, LLC

Optimal discriminant analysis (ODA) is a new paradigm in the general statistical analysis of data, which explicitly maximizes the accuracy achieved by a model for every statistical analysis, in the context of exact distribution theory. This paper reviews optimal
analogues of traditional statistical methods, as well as new special-purpose models for which no conventional alternatives exist.

Author’s Note: This paper reviews initial discoveries of the ODA paradigm. Here is a current review: https://odajournal.com/2017/04/18/what-is-optimal-data-analysis/

View journal article

Maximizing Accuracy of Classification Trees by Optimal Pruning

Maximizing Accuracy of Classification Trees by Optimal Pruning

Paul R. Yarnold, Ph.D., and Robert C. Soltysik, M.S.

Optimal Data Analysis, LLC

We describe a pruning methodology which maximizes effect strength for sensitivity of classification tree models. After deconstructing the initial “Bonferroni-pruned” model into all possible nested sub-branches, the sub-branch which explicitly maximizes mean sensitivity is identified. This methodology is illustrated using models predicting in-hospital mortality of 1,193 (Study 1) and 1,660 (Study 2) patients with AIDS-related Pneumocystis carinii pneumonia.

View journal article

Two-Group MultiODA: A Mixed-Integer Linear Programming Solution with Bounded M

Two-Group MultiODA: A Mixed-Integer Linear Programming Solution with Bounded M

Robert C. Soltysik, M.S., and Paul R. Yarnold, Ph.D.

Optimal Data Analysis, LLC

Prior mixed-integer linear programming procedures for obtaining two-group multivariable optimal discriminant analysis (MultiODA) models require estimation of the value of a parameter, M. A new formulation is presented which establishes a lower bound for M, which executes more quickly than prior formulations. A sufficient condition for the nonexistence of classification gaps and ambiguous solutions, optimal weighted classification, use of nonlinear terms, selecting an optimal subset of attributes, and aggregation of duplicate observations are discussed. When the design involves six or fewer binary attributes, MultiODA models may easily be obtained for massive samples.

View journal article

Unconstrained Covariate Adjustment in CTA

Unconstrained Covariate Adjustment in CTA 

Paul R. Yarnold, Ph.D. and Robert C. Soltysik, M.S.

Optimal Data Analysis, LLC

In traditional statistical covariate analysis it is common practice to force entry of the covariate into the model first, to eliminate the effect of the covariate (i.e., “equate the groups”) on the dependent measure. In contrast, in CTA the covariate is treated as an ordinary attribute which must compete with other eligible attributes for selection into the model based on operator-specified options. This paper illustrates optimal covariate analysis using an application involving predicting patient in-hospital mortality via CTA.

View journal article

Maximizing the Accuracy of Probit Models via UniODA

Maximizing the Accuracy of Probit Models via UniODA 

Barbara M. Yarnold, J.D., Ph.D. and Paul R. Yarnold, Ph.D.

Optimal Data Analysis, LLC

Paralleling the procedure used to maximize ESS of linear models derived using logistic regression analysis or Fisher’s discriminant analysis, univariate optimal discriminant analysis (UniODA) is applied to the predicted response function values provided by a
model derived by probit analysis (PA), and returns an adjusted decision criterion for making classification decisions. ESS obtains its theoretical maximum value with this adjusted decision criterion, and the ability of the PA model to return accurate classifications is optimized. UniODA-refinement of a PA model is illustrated using an example involving political science analysis of federal courts.

View journal article

Precision and Convergence of Monte Carlo Estimation of Two-Category UniODA Two-Tailed p

Precision and Convergence of Monte Carlo Estimation of Two-Category UniODA Two-Tailed p

Paul R. Yarnold, Ph.D. and Robert C. Soltysik, M.S.

Optimal Data Analysis, LLC

Monte Carlo (MC) research was used to study precision and convergence properties of MC methodology used to assess Type I error in exploratory (post hoc, or two-tailed) UniODA involving two balanced (equal N) classes. Study 1 ran 106 experiments for each N, and estimated cumulative p’s were compared with corresponding exact p for all known p values. Study 2 ran 105 experiments for each N, and observed the convergence of the estimated p’s. UniODA cumulative probabilities estimated using 105 experiments are only modestly less accurate than probabilities estimated using 106 experiments, and the maximum observed error (±0.002) is small. Study 3 ran 105 experiments for Ns ranging as high as 8,000 observations in order to examine asymptotic properties of optimal values for balanced designs.

View journal article

Aggregated vs. Referenced Categorical Attributes in UniODA and CTA

Aggregated vs. Referenced Categorical Attributes in UniODA and CTA 

Paul R. Yarnold, Ph.D. and Robert C. Soltysik, M.S.

Optimal Data Analysis, LLC

Multivariable linear methods such as logistic regression analysis, discriminant analysis, or multiple regression analysis, for example, directly incorporate binary categorical attributes into their solution. However, for categorical attributes having more than two levels, each level must first be individually dummy-coded, then one level must be selected for use as a reference category and omitted from analysis. Selection of one or another level as the reference category can mask effects which otherwise would have materialized, if a different level had been chosen. Neither UniODA nor CTA require reference categories in analysis using multicategorical attributes.

View journal article

Manual vs. Automated CTA: Optimal Preadmission Staging for Inpatient Mortality from Pneumocystis cariini Pneumonia

Manual vs. Automated CTA: Optimal Preadmission Staging for Inpatient Mortality from Pneumocystis cariini Pneumonia

Paul R. Yarnold, Ph.D. and Robert C. Soltysik, M.S.

Optimal Data Analysis, LLC

Two severity-of-illness models used for staging risk of in-hospital mortality from AIDS-related Pneumocystis cariini pneumonia (PCP) were developed using hierarchically optimal classification tree analysis (CTA), with models derived manually via UniODA
software. The first of the “Manual vs. Automated CTA” series, this study contrasts classification results between original models and corresponding new models derived using automated analysis. Findings provide superior staging systems which may be employed to improve results of applied research in this area.

View journal article

Manual vs. Automated CTA: Psychosocial Adaptation in Young Adolescents

Manual vs. Automated CTA: Psychosocial Adaptation in Young Adolescents

Rachael Millstein Coakley, Ph.D., Grayson N. Holmbeck,Ph.D., Fred B. Bryant, Ph.D., and Paul R. Yarnold, Ph.D.

Children’s Hospital, Boston / Harvard Medical School, Loyola University Chicago,  Optimal Data Analysis, LLC

Compared to the manually-derived model, the enumerated CTA model was 20% more parsimonious, 3.6% more accurate and 30% more efficient, and was more consistent with a priori hypotheses.

View journal article

Gen-UniODA vs. Log-Linear Model: Modeling Organizational Discrimination

Gen-UniODA vs. Log-Linear Model: Modeling Organizational Discrimination 

Paul R. Yarnold, Ph.D. and Robert C. Soltysik, M.S.
Optimal Data Analysis, LLC

An application involving a binary class variable (gender), an ordinal attribute (academic rank), and two testing periods (separated by six years) was troublesome for the log-linear model, but was easily analyzed using Gen-UniODA.

View journal article

UniODA vs. Chi-Square: Ordinal Data Sometimes Feign Categorical

UniODA vs. Chi-Square: Ordinal Data Sometimes Feign Categorical

Paul R. Yarnold, Ph.D. and Robert C. Soltysik, M.S.
Optimal Data Analysis, LLC

Assessed using perhaps the most widely used type of measurement scale in all science, ordinal data are often misidentified as being categorical, and incorrectly analyzed by chi-square analysis. Three examples drawn from the literature are reanalyzed.

View journal article

The Use of Unconfounded Climatic Data Improves Atmospheric Prediction

The Use of Unconfounded Climatic Data Improves Atmospheric Prediction  

Robert C. Soltysik, M.S., and Paul R. Yarnold, Ph.D.
Optimal Data Analysis, LLC

This report improves measurement properties of data and analytic methods widely used in meteorological modeling and forecasting. Paradoxical confounding is defined and demonstrated using global temperature land-ocean index data. It is shown that failure to address paradoxical confounding results in suboptimal atmospheric circulation pattern models, and correcting prior measurement and analytic deficiencies results in more accurate prediction of temperature and precipitation anomalies, and export of Arctic sea ice.

View journal article

Here Today, Gone Tomorrow: Understanding Freshman Attrition Using Person-Environment Fit Theory

Here Today, Gone Tomorrow: Understanding Freshman Attrition Using Person-Environment Fit Theory

Jennifer Howard Smith, Ph.D., Fred B. Bryant, Ph.D., David Njus, Ph.D., and Emil J. Posavac, Ph.D.

Applied Research Solutions, Inc., Loyola University Chicago, Luther College, Loyola University Chicago (Emeritus)

Person-Environment (PE) fit theory was used to explore the relationship between student involvement and freshman retention. Incoming freshmen (N=382) were followed longitudinally in a two-wave panel study, the summer before beginning college, and again
during the spring of their freshman year. Involvement levels, a variety of summer and spring preferences (Ps), and spring perceptions (Es) regarding specific aspects of their college environment were assessed. Twelve PE fit indicators were derived and compared with respect to their relationship with student involvement and retention. Results indicated that involvement was linked to some PE fit indicators. Traditional parametric statistical analyses were compared with a new, nonparametric technique, Classification Tree Analysis (CTA), to identify the most accurate classification model for use in designing potential attrition interventions. Discriminant analysis was 14% more accurate than CTA in classifying returners (97% vs. 85%), but CTA was 962% more accurate classifying dropouts (8% vs. 84%). CTA identified nine clusters—five of returners and four of dropouts, revealing that different subgroups of freshmen chose to return (and stay) for different reasons. Students’ end-of-the-year preferences appear to be more important than anticipated preferences, college perceptions, or PE fit levels.

View journal article

Tracing Prospective Profiles of Juvenile Delinquency and Non-Delinquency: An Optimal Classification Tree Analysis

Tracing Prospective Profiles of Juvenile Delinquency and Non-Delinquency: An Optimal Classification Tree Analysis

Hideo Suzuki, Ph.D., Fred B. Bryant, Ph.D., and John D. Edwards, Ph.D.

This study explored multiple variables that influence the development of juvenile delinquency. Two datasets of the National Youth Survey, a longitudinal study of delinquency and drug use among youths from 1976 and 1978, were used: 166 predictors were selected from the 1976 dataset, and later self-reported delinquency was selected from the 1978 dataset. Optimal data analysis was then used to construct a hierarchical classification tree model tracing the causal roots of juvenile delinquency and non-delinquency. Five attributes entered the final model and provided 70.37% overall classification accuracy: prior self-reported delinquency, exposure to peer delinquency, exposure to peer alcohol use, attitudes toward marijuana use, and grade level in school. Prior self-reported delinquency was the strongest predictor of later juvenile delinquency. These results highlight seven distinct profiles of juvenile delinquency and non-delinquency: lay delinquency, unexposed chronic delinquency, exposed chronic delinquency, unexposed non-delinquency, exposed non-delinquency, unexposed reformation, and exposed reformation.

View journal article

How to Save the Binary Class Variable and Predicted Probability of Group Membership from Logistic Regression Analysis to an ASCII Space-Delimited File in SPSS 17 For Windows

How to Save the Binary Class Variable and Predicted Probability of Group Membership from Logistic Regression Analysis to an ASCII Space-Delimited File in SPSS 17 For Windows

Fred B. Bryant, Ph.D.
Loyola University, Chicago

This note explains the steps involved and provides the SPSS syntax needed to run two-group logistic regression analysis using SPSS 17 for Windows, and output to an ASCII space-delimited data file the binary class variable and predicted probability of group membership (i.e., “Y-hat”) from an SPSS logistic regression analysis.

View journal article

An Internet-Based Intervention for Fibromyalgia Self-Management: Initial Design and Alpha Test

An Internet-Based Intervention for Fibromyalgia Self-Management: Initial Design and Alpha Test

William Collinge, Ph.D., Robert C. Soltysik, M.S., and Paul R.Yarnold, Ph.D.

Collinge and Associates and Optimal Data Analysis, LLC

Self-Monitoring And Review Tool, or SMART, is an interactive, internet-based, self-monitoring and feedback system, which helps people discover and monitor links between their own health-related behaviors, management strategies, and symptom levels over time. SMART involves longitudinal collection and optimal analysis of an individual’s self-monitoring data, and delivery of personalized feedback derived from the data. Forty women with fibromyalgia (FM) enrolled in a three-month alpha test of the SMART system. Utilization, satisfaction, and compliance were high across the test period, and higher utilization was predictive of lower anxiety, and improved physical functioning and self-efficacy.

View journal article

Junk Science, Test Validity, and the Uniform Guidelines for Personnel Selection Procedures: The Case of Melendez v. Illinois Bell

Junk Science, Test Validity, and the Uniform Guidelines for Personnel Selection Procedures: The Case of Melendez v. Illinois Bell

Fred B. Bryant, Ph.D. and Elaine K.B. Siegel

Loyola University Chicago and Hager & Siegel, P.C.

This paper stems from a recent federal court case in which a standardized test of cognitive ability developed by AT&T, the Basic Scholastic Aptitude Test (BSAT), was ruled invalid and discriminatory for use in hiring Latinos. Within the context of the BSAT, we discuss spurious statistical arguments advanced by the defense, exploiting certain language in the current Uniform Guidelines for evaluating the fairness and validity of personnel selection tests. These issues include: (a) how to avoid capitalizing on chance; (b) what constitutes “a measure” of job performance; (c) how to judge the meaningfulness of group differences in performance measures; and (d) how to combine data from different sex, race, or ethnic subgroups when computing validity coefficients for the pooled, total sample. Pursuant to the Uniform Guidelines’ standard for unfairness, when one ethnic group scores higher on an employment test, the test is deemed “unfair” if this difference is not reflected in a measure of job performance. Although studies validating selection instruments often survive the unfairness test, such data are vulnerable to bias and manipulation, if appropriate statistical procedures are not used. We consider both the benefits (greater clarity and precision) and the potential costs (loss of legal precedent) of revising the Uniform Guidelines to address these issues. We further discuss legal procedures to limit “junk science” in the courtroom, and the need to reevaluate validity generalization in light of Simpson’s “false correlation” paradox.

View journal article