How to Create an ASCII Input Data File for UniODA and CTA Software

How to Create an ASCII Input Data File for UniODA and CTA Software

Fred B. Bryant & Patrick R. Harrison

Loyola University Chicago

UniODA and CTA software require an ASCII (unformatted text) file as input data. Arguably the most difficult task an operator faces in conducting analyses is converting the original data file from (a) whatever software package was used to enter the data, into (b) an ASCII file for analysis. This article first highlights critical issues concerning missing data, variable labels, and variable types that users must address in order to convert their data into an ASCII file for analysis using ODA software. Specific steps needed to convert a data set from its original file-type into a space-delimited ASCII file are then discussed. The process of converting data into ASCII files for use as input data is illustrated for three leading statistical software packages: SPSS, SAS, and STATISTICA.

View journal article

Maximizing the Accuracy of Multiple Regression Models using UniODA: Regression Away From the Mean

Maximizing the Accuracy of Multiple Regression Models using UniODA: Regression Away From the Mean

Paul R. Yarnold, Ph.D., Fred B. Bryant, Ph.D., and Robert C. Soltysik, M.S.

Optimal Data Analysis, LLC and Loyola University Chicago

Standard regression models best predict values that lie near the mean. Three examples illustrate how optimization of the regression model using an established UniODA methodology greatly improves accurate prediction of extreme values.

View journal article

Statistical Power of Optimal Discrimination with a Normal Attribute and Two Classes: One-Tailed Hypotheses

Statistical Power of Optimal  Discrimination with a Normal Attribute and Two Classes: One-Tailed Hypotheses

Robert C. Soltysik, M.S., and Paul R. Yarnold, Ph.D.

Optimal Data Analysis, LLC

This note reports statistical power (1-β) obtained by ODA when used with a normally-distributed attribute, as a function of alpha and effect size.

View journal article

Modeling Individual Reactivity in Serial Designs: Changes in Weather and Physical Symptoms in Fibromyalgia

Modeling Individual Reactivity in Serial Designs: Changes in Weather and Physical Symptoms in Fibromyalgia

Paul R. Yarnold, Ph.D., Robert C. Soltysik, M.S., and William Collinge, Ph.D.

Optimal Data Analysis, LLC and Collinge and Associates

This note criticizes current statistical convention, and discusses and illustrates appropriate statistical methodology for investigating the relationship between weather and individual symptoms.

View journal article

Reverse CTA Versus Multiple Regression Analysis

Reverse CTA Versus Multiple Regression Analysis 

Paul R. Yarnold, Ph.D. and Robert C. Soltysik, M.S.

Optimal Data Analysis, LLC

This paper illustrates how to reverse CTA for applications having an ordered class variable and categorical attributes. Whereas a regression model is used to make point predictions for the dependent measure based on values of the independent variables, reverse CTA is used to find domains on the dependent measure which are explained by the independent variables.

View journal article

Manual vs. Automated CTA: Predicting Freshman Attrition

Manual vs. Automated CTA: Predicting Freshman Attrition

Paul R. Yarnold, Ph.D., Fred B. Bryant, Ph.D., and Jennifer Howard Smith, Ph.D.

Optimal Data Analysis, LLC, Loyola University Chicago, Applied Research Solutions, Inc.

The enumerated model was 20% more accurate, but 43% less parsimonious and 31% less efficient than the manually-derived model. Granularity afforded by the enumerated model enabled prediction of seven of eight incoming freshmen who left college. Substantive, policy, and methodological implications are considered.

View journal article

Comparing Knot Strength Using UniODA

Comparing Knot Strength Using UniODA

Paul R. Yarnold, Ph.D. and Gordon C. Brofft, BS

Optimal Data Analysis, LLC  and Marine and Water Consultant

This study assessed comparative strength of three versatile knots widely used in big-game fishing. Experiment One compared Uni and San Diego knots tied in 30-, 40- and 50-pound-test monofilament line (the modal strengths), finding no statistically significant differences in knot strength. Experiment Two attached 40-pound-test monofilament line to 50- and 65-pound-test solid spectra, and to 60-pound-test hollow spectra line using a
Double Uni knot, and found the 40-to-65 connection was strongest. High levels of variation in knot strength which were observed raises concern about the durability and consistency of monofilament line.

View journal article

The Loyola Experience (1993-2009): Optimal Data Analysis in the Department of Psychology

The Loyola Experience (1993-2009): Optimal Data Analysis in the Department of Psychology

Fred B. Bryant, Ph.D.

Loyola University Chicago

This article traces the origins and development of the use of optimal data analysis (ODA) within the Department of Psychology at Loyola University Chicago over the past 17 years. An initial set of ODA-based articles by Loyola faculty laid the groundwork for a sustained upsurge in the use of ODA among graduate students which has lasted for more than a decade and a half. These student projects subsequently fueled an increase in ODA-based publications by other Loyola Psychology faculty, who directly supervised the various student projects. Thus, ODA initially trickled down from faculty to students, but later grew up in the opposite direction. The most frequent use of ODA in Loyola’s Psychology Department has been to conduct classification tree analysis, with less common uses of ODA including optimal discriminant analysis and the iterative structural decomposition of transition tables. As more Loyola Psychology graduate students find academic jobs and continue using ODA in their research, we expect that they will replicate the Loyola experience in these new academic settings.

View journal article

Optimal Data Analysis: A General Statistical Analysis Paradigm

Optimal Data Analysis: A General Statistical Analysis Paradigm

Paul R. Yarnold, Ph.D., and Robert C. Soltysik, M.S.

Optimal Data Analysis, LLC

Optimal discriminant analysis (ODA) is a new paradigm in the general statistical analysis of data, which explicitly maximizes the accuracy achieved by a model for every statistical analysis, in the context of exact distribution theory. This paper reviews optimal
analogues of traditional statistical methods, as well as new special-purpose models for which no conventional alternatives exist.

Author’s Note: This paper reviews initial discoveries of the ODA paradigm. Here is a current review: https://odajournal.com/2017/04/18/what-is-optimal-data-analysis/

View journal article

Maximizing Accuracy of Classification Trees by Optimal Pruning

Maximizing Accuracy of Classification Trees by Optimal Pruning

Paul R. Yarnold, Ph.D., and Robert C. Soltysik, M.S.

Optimal Data Analysis, LLC

We describe a pruning methodology which maximizes effect strength for sensitivity of classification tree models. After deconstructing the initial “Bonferroni-pruned” model into all possible nested sub-branches, the sub-branch which explicitly maximizes mean sensitivity is identified. This methodology is illustrated using models predicting in-hospital mortality of 1,193 (Study 1) and 1,660 (Study 2) patients with AIDS-related Pneumocystis carinii pneumonia.

View journal article

Two-Group MultiODA: A Mixed-Integer Linear Programming Solution with Bounded M

Two-Group MultiODA: A Mixed-Integer Linear Programming Solution with Bounded M

Robert C. Soltysik, M.S., and Paul R. Yarnold, Ph.D.

Optimal Data Analysis, LLC

Prior mixed-integer linear programming procedures for obtaining two-group multivariable optimal discriminant analysis (MultiODA) models require estimation of the value of a parameter, M. A new formulation is presented which establishes a lower bound for M, which executes more quickly than prior formulations. A sufficient condition for the nonexistence of classification gaps and ambiguous solutions, optimal weighted classification, use of nonlinear terms, selecting an optimal subset of attributes, and aggregation of duplicate observations are discussed. When the design involves six or fewer binary attributes, MultiODA models may easily be obtained for massive samples.

View journal article

Unconstrained Covariate Adjustment in CTA

Unconstrained Covariate Adjustment in CTA 

Paul R. Yarnold, Ph.D. and Robert C. Soltysik, M.S.

Optimal Data Analysis, LLC

In traditional statistical covariate analysis it is common practice to force entry of the covariate into the model first, to eliminate the effect of the covariate (i.e., “equate the groups”) on the dependent measure. In contrast, in CTA the covariate is treated as an ordinary attribute which must compete with other eligible attributes for selection into the model based on operator-specified options. This paper illustrates optimal covariate analysis using an application involving predicting patient in-hospital mortality via CTA.

View journal article

Maximizing the Accuracy of Probit Models via UniODA

Maximizing the Accuracy of Probit Models via UniODA 

Barbara M. Yarnold, J.D., Ph.D. and Paul R. Yarnold, Ph.D.

Optimal Data Analysis, LLC

Paralleling the procedure used to maximize ESS of linear models derived using logistic regression analysis or Fisher’s discriminant analysis, univariate optimal discriminant analysis (UniODA) is applied to the predicted response function values provided by a
model derived by probit analysis (PA), and returns an adjusted decision criterion for making classification decisions. ESS obtains its theoretical maximum value with this adjusted decision criterion, and the ability of the PA model to return accurate classifications is optimized. UniODA-refinement of a PA model is illustrated using an example involving political science analysis of federal courts.

View journal article

Precision and Convergence of Monte Carlo Estimation of Two-Category UniODA Two-Tailed p

Precision and Convergence of Monte Carlo Estimation of Two-Category UniODA Two-Tailed p

Paul R. Yarnold, Ph.D. and Robert C. Soltysik, M.S.

Optimal Data Analysis, LLC

Monte Carlo (MC) research was used to study precision and convergence properties of MC methodology used to assess Type I error in exploratory (post hoc, or two-tailed) UniODA involving two balanced (equal N) classes. Study 1 ran 106 experiments for each N, and estimated cumulative p’s were compared with corresponding exact p for all known p values. Study 2 ran 105 experiments for each N, and observed the convergence of the estimated p’s. UniODA cumulative probabilities estimated using 105 experiments are only modestly less accurate than probabilities estimated using 106 experiments, and the maximum observed error (±0.002) is small. Study 3 ran 105 experiments for Ns ranging as high as 8,000 observations in order to examine asymptotic properties of optimal values for balanced designs.

View journal article

Aggregated vs. Referenced Categorical Attributes in UniODA and CTA

Aggregated vs. Referenced Categorical Attributes in UniODA and CTA 

Paul R. Yarnold, Ph.D. and Robert C. Soltysik, M.S.

Optimal Data Analysis, LLC

Multivariable linear methods such as logistic regression analysis, discriminant analysis, or multiple regression analysis, for example, directly incorporate binary categorical attributes into their solution. However, for categorical attributes having more than two levels, each level must first be individually dummy-coded, then one level must be selected for use as a reference category and omitted from analysis. Selection of one or another level as the reference category can mask effects which otherwise would have materialized, if a different level had been chosen. Neither UniODA nor CTA require reference categories in analysis using multicategorical attributes.

View journal article

Manual vs. Automated CTA: Optimal Preadmission Staging for Inpatient Mortality from Pneumocystis cariini Pneumonia

Manual vs. Automated CTA: Optimal Preadmission Staging for Inpatient Mortality from Pneumocystis cariini Pneumonia

Paul R. Yarnold, Ph.D. and Robert C. Soltysik, M.S.

Optimal Data Analysis, LLC

Two severity-of-illness models used for staging risk of in-hospital mortality from AIDS-related Pneumocystis cariini pneumonia (PCP) were developed using hierarchically optimal classification tree analysis (CTA), with models derived manually via UniODA
software. The first of the “Manual vs. Automated CTA” series, this study contrasts classification results between original models and corresponding new models derived using automated analysis. Findings provide superior staging systems which may be employed to improve results of applied research in this area.

View journal article

Manual vs. Automated CTA: Psychosocial Adaptation in Young Adolescents

Manual vs. Automated CTA: Psychosocial Adaptation in Young Adolescents

Rachael Millstein Coakley, Ph.D., Grayson N. Holmbeck,Ph.D., Fred B. Bryant, Ph.D., and Paul R. Yarnold, Ph.D.

Children’s Hospital, Boston / Harvard Medical School, Loyola University Chicago,  Optimal Data Analysis, LLC

Compared to the manually-derived model, the enumerated CTA model was 20% more parsimonious, 3.6% more accurate and 30% more efficient, and was more consistent with a priori hypotheses.

View journal article