MegaODA Large Sample and BIG DATA Time Trials: Maximum Velocity Analysis

Paul R. Yarnold & Robert C. Soltysik

Optimal Data Analysis, LLC

This third time trial of newly-released MegaODA™ software studies the fastest-to-analyze application, known as a 2×2 cross-classification table. Designs involving unweighted binary data are arguably currently the most widely employed across quantitative scientific disciplines as well as engineering fields including communications, graphics, data compression, real-time processing and autonomous synthetic decision-making, among others. The present simulation research is run on a 3 GHz Intel Pentium D microcomputer and reveals MegaODA returns the exact one- or two-tailed Type I error rate, as well as all of the other classification-relevant statistics provided in UniODA analysis, in fractions of a CPU second for samples of a million observations.

View journal article

Determining When Annual Crude Mortality Rate Most Recently Began Increasing in North Dakota Counties, I: Backward-Stepping Little Jiffy

Paul R. Yarnold

Optimal Data Analysis, LLC

Recent research tested the hypothesis that the annual crude mortality rate (ACMR) was higher after versus before 1998 in counties of North Dakota, due to increased exposure of the population to environmental toxins and hazards beginning approximately at that time. This hypothesis was confirmed with experimentwise p<0.05 for 16 counties. This article investigates the ACMR time series for each of these counties using a backward-stepping little jiffy UniODA analysis to ascertain precisely when ACMR began to increase. As hypothesized the ACMR began increasing in Bowman and Kidder counties precisely in 1998. Consistent with the a priori hypothesis the initial (and presently sustained) increase in ACMR occurred in McLean county in 1997, and in Foster county in 1996. Significant sustained increases in ACMR initially began in Stark county in 1993, and in Burleigh county in 1988. The UniODA models identified hypothesized, recent, powerful, sustained, statistically significant increases in ACMR.

View journal article

Surfing the Index of Consumer Sentiment: Identifying Statistically Significant Monthly and Yearly Changes

Paul R. Yarnold

Optimal Data Analysis, LLC

Published monthly by the Survey Research Center of the University of Michigan, the Index of Consumer Sentiment (ICS) is widely followed, and one of its factors (the Index of Consumer Expectations) is used in the Leading Indicator Composite Index published by the US Department of Commerce, Bureau of Economic Analysis. Using household telephone interviews the ICS provides an empirical measure of near-term consumer attitudes on business climate, and personal finance and spending. Variation in ICS influences price and volume in currency, bond, and equity markets in the US and in markets globally. The practice of releasing monthly ICS values five minutes to two seconds earlier for elite customers via high-speed communication channels was recently suspended because it provided unfair trading advantages. This article investigates the trajectory of the ICS over the most recent three-years, evaluating the statistical significance of month-over-month and year-over-year changes. These analyses define a longitudinal series of class variables which may be modeled temporally using time-lagged single- (UniODA) and multiple- (CTA) attribute ODA methods.

View journal article

ODA Range Test vs. One-Way Analysis of Variance: Patient Race and Lab Results

Paul R. Yarnold

Optimal Data Analysis, LLC

Mean scores on a continuous dependent measure are compared across three or more groups using one-way analysis of variance (ANOVA). If a statistically significant overall or “omnibus” effect emerges, then a multiple comparisons procedure is used to ascertain the exact nature of any interclass differences. In contrast, the dependent measure may be compared between classes with UniODA to assess if thresholds on the dependent measure can discriminate the classes. If the resulting ESS accuracy statistic for the overall effect is statistically reliable then an optimal (maximum-accuracy) range test is employed to ascertain the exact nature of interclass differences. ANOVA and UniODA are used to investigate the differences between n=377 white, n=378 African American, and n=257 Hispanic patients with HIV-associated Pneumocystis carinii pneumonia (PCP) on two laboratory tests (albumin and alveolar-arterial oxygen difference) associated with PCP outcomes.

View journal article

MegaODA Large Sample and BIG DATA Time Trials: Harvesting the Wheat

Robert C. Soltysik & Paul R. Yarnold

Optimal Data Analysis, LLC

In research involving multiple tests of statistical hypotheses the efficiency of Monte Carlo (MC) simulation used to estimate the Type I error rate (p) is maximized using a two-step procedure. The first step is identifying the effects that are not statistically significant or ns. The second step of the procedure is verifying that remaining effects are statistically significant at the generalized or experimentwise criterion (p<0.05), necessary in order to reject the null hypothesis and accept the alternative hypothesis that a statistically significant effect occurred. This research uses experimental simulation to explore the ability of MegaODA to identify p values of 0.01 and 0.001, and sample sizes of n=100,000 and n=1,000,000. Solution speeds ranged from 5 to more than 83,000 CPU seconds running MegaODA software on a 3 GHz Intel Pentium D microcomputer. Using MegaODA it is straightforward to rapidly rule-in p<0.05 for weak and moderate effects by Monte Carlo simulation with large samples and BIG DATA in designs having ordinal attributes with or without weights applied to observations. Significantly greater time was required for problems involving continuous attributes but even the most computer-intensive analyses were completed in less than a day.

View journal article

ODA Range Test vs. One-Way Analysis of Variance: Comparing Strength of Alternative Line Connections

Paul R. Yarnold & Gordon C. Brofft

Optimal Data Analysis, LLC

Among the most popular conventional statistical methods, Student’s t-test is used to compare the means of two groups on a single dependent measure assessed on a continuous scale. When three or more groups are compared, t-test is generalized to one-way analysis of variance (ANOVA). If the F statistic associated with the overall or “omnibus” effect is statistically reliable, then a range test that is more efficient than performing all possible comparisons is used to ascertain the exact nature of interclass differences. In contrast, the dependent measure may be compared between classes via UniODA, to assess if thresholds on the dependent measure separate the classes. If the resulting ESS accuracy statistic for the omnibus effect is statistically reliable, then a recently-developed optimal range test is used to assess the exact nature of interclass differences. ANOVA and UniODA are used to compare three methods commonly used in competitive big-game sport fishing for connecting segments of fishing line. Similarities and differences of parametric GLM and non-parametric ODA methods are demonstrated.

View journal article

MegaODA Large Sample and BIG DATA Time Trials: Separating the Chaff

Robert C. Soltysik & Paul R. Yarnold

Optimal Data Analysis, LLC

Just-released MegaODA™ software is capable of conducting UniODA analysis for an unlimited number of attributes using samples as large as one million observations. To minimize the computational burden associated with Monte Carlo simulation used to estimate the Type I error rate (p), the first step in statistical analysis is identifying effects that are not statistically significant or ns. This article presents an experimental simulation exploring the ability of MegaODA to identify ns effects in a host of designs involving a binary class variable, under ultimately challenging discrimination conditions (all data are random) for sample sizes of n=100,000 and n=1,000,000. Most analyses were solved in CPU seconds running MegaODA on a 3 GHz Intel Pentium D microcomputer. Using MegaODA it is straightforward to rapidly rule-out ns effects using Monte Carlo simulation with BIG DATA for large numbers of attributes in simple or complex, single- or multiple-sample designs involving categorical or ordered attributes either with or without weights being applied to individual observations.

View journal article