ODA Range Test vs. One-Way Analysis of Variance: Patient Race and Lab Results

Paul R. Yarnold

Optimal Data Analysis, LLC

Mean scores on a continuous dependent measure are compared across three or more groups using one-way analysis of variance (ANOVA). If a statistically significant overall or “omnibus” effect emerges, then a multiple comparisons procedure is used to ascertain the exact nature of any interclass differences. In contrast, the dependent measure may be compared between classes with UniODA to assess if thresholds on the dependent measure can discriminate the classes. If the resulting ESS accuracy statistic for the overall effect is statistically reliable then an optimal (maximum-accuracy) range test is employed to ascertain the exact nature of interclass differences. ANOVA and UniODA are used to investigate the differences between n=377 white, n=378 African American, and n=257 Hispanic patients with HIV-associated Pneumocystis carinii pneumonia (PCP) on two laboratory tests (albumin and alveolar-arterial oxygen difference) associated with PCP outcomes.

View journal article

MegaODA Large Sample and BIG DATA Time Trials: Harvesting the Wheat

Robert C. Soltysik & Paul R. Yarnold

Optimal Data Analysis, LLC

In research involving multiple tests of statistical hypotheses the efficiency of Monte Carlo (MC) simulation used to estimate the Type I error rate (p) is maximized using a two-step procedure. The first step is identifying the effects that are not statistically significant or ns. The second step of the procedure is verifying that remaining effects are statistically significant at the generalized or experimentwise criterion (p<0.05), necessary in order to reject the null hypothesis and accept the alternative hypothesis that a statistically significant effect occurred. This research uses experimental simulation to explore the ability of MegaODA to identify p values of 0.01 and 0.001, and sample sizes of n=100,000 and n=1,000,000. Solution speeds ranged from 5 to more than 83,000 CPU seconds running MegaODA software on a 3 GHz Intel Pentium D microcomputer. Using MegaODA it is straightforward to rapidly rule-in p<0.05 for weak and moderate effects by Monte Carlo simulation with large samples and BIG DATA in designs having ordinal attributes with or without weights applied to observations. Significantly greater time was required for problems involving continuous attributes but even the most computer-intensive analyses were completed in less than a day.

View journal article

ODA Range Test vs. One-Way Analysis of Variance: Comparing Strength of Alternative Line Connections

Paul R. Yarnold & Gordon C. Brofft

Optimal Data Analysis, LLC

Among the most popular conventional statistical methods, Student’s t-test is used to compare the means of two groups on a single dependent measure assessed on a continuous scale. When three or more groups are compared, t-test is generalized to one-way analysis of variance (ANOVA). If the F statistic associated with the overall or “omnibus” effect is statistically reliable, then a range test that is more efficient than performing all possible comparisons is used to ascertain the exact nature of interclass differences. In contrast, the dependent measure may be compared between classes via UniODA, to assess if thresholds on the dependent measure separate the classes. If the resulting ESS accuracy statistic for the omnibus effect is statistically reliable, then a recently-developed optimal range test is used to assess the exact nature of interclass differences. ANOVA and UniODA are used to compare three methods commonly used in competitive big-game sport fishing for connecting segments of fishing line. Similarities and differences of parametric GLM and non-parametric ODA methods are demonstrated.

View journal article

MegaODA Large Sample and BIG DATA Time Trials: Separating the Chaff

Robert C. Soltysik & Paul R. Yarnold

Optimal Data Analysis, LLC

Just-released MegaODA™ software is capable of conducting UniODA analysis for an unlimited number of attributes using samples as large as one million observations. To minimize the computational burden associated with Monte Carlo simulation used to estimate the Type I error rate (p), the first step in statistical analysis is identifying effects that are not statistically significant or ns. This article presents an experimental simulation exploring the ability of MegaODA to identify ns effects in a host of designs involving a binary class variable, under ultimately challenging discrimination conditions (all data are random) for sample sizes of n=100,000 and n=1,000,000. Most analyses were solved in CPU seconds running MegaODA on a 3 GHz Intel Pentium D microcomputer. Using MegaODA it is straightforward to rapidly rule-out ns effects using Monte Carlo simulation with BIG DATA for large numbers of attributes in simple or complex, single- or multiple-sample designs involving categorical or ordered attributes either with or without weights being applied to individual observations.

View journal article

Creating a Data Set with SAS™ and Maximizing ESS of a Multiple Regression Analysis Model for a Likert-Type Dependent Variable Using UniODA™ and MegaODA™ Software

Paul R. Yarnold

Optimal Data Analysis, LLC

This note presents SAS™ code for creating a requisite data set and UniODA™ and MegaODA™ code for maximizing the accuracy (ESS) of a multiple regression analysis-based model involving a Likert-type dependent measure with ten or fewer response options.

View journal article

Univariate and Multivariate Analysis of Categorical Attributes with Many Response Categories

Paul R. Yarnold

Optimal Data Analysis, LLC

A scant few weeks ago disentanglement of effects identified in purely categorical designs in which all variables are categorical, including notoriously-complex rectangular categorical designs (RCDs) in which variables have a different number of response categories, was poorly understood. However, univariate and multivariate optimal (“maximum-accuracy”) statistical methods, specifically UniODA and automated CTA, make the analyses of such designs straightforward. These methods are illustrated using an example involving n=1,568 randomly selected patients having either confirmed or presumed Pneumocystis carinii pneumonia (PCP). Four categorical variables used in analysis include patient status (two categories: alive, dead), gender (male, female), city of residence (seven categories), and type of health insurance (ten categories). Examination of the cross-tabulations of these variables makes it obvious why conventional statistical methods such as chi-square analysis, logistic regression analysis, and log-linear analysis are both inappropriate for, as well as easily overwhelmed by such designs. In contrast, UniODA and CTA identified maximum-accuracy solutions effortlessly in this application.

View journal article

Analyzing Categorical Attributes Having Many Response Options

Paul R. Yarnold

Optimal Data Analysis, LLC

Rectangular categorical designs (RCDs) are among the most prevalent designs in science. Categorical designs involve categorical measures. Examples of categorical measures include gender with two response categories (male, female), color with three response categories (red, blue, green), or political affiliation with many response categories. RCDs cross-tabulate two or more categorical variables that have a different number of response options: that is, there are more columns than rows in the cross-tabulation table, or the opposite is true. Analysis via chi-square doesn’t enable researchers to disentangle effects which are identified in such designs, especially when the number of response options is large—as are typically used for attributes such as ethnicity, marital status, type of health insurance, or state of residence, for example. This article demonstrates how to disentangle effects within rectangular (or square) cross-classification tables using UniODA, and demonstrates this first for a sample of n=1,568 patients with confirmed or presumed Pneumocystis carinni pneumonia (PCP) measured on city and gender, and a second time for n=144 ESS and ESP values arising in a comparison of analysis of qualitative strength categories achieved by models identified using raw or z-score data.

View journal article