MegaODA Large Sample and BIG DATA Time Trials: Harvesting the Wheat

Robert C. Soltysik & Paul R. Yarnold

Optimal Data Analysis, LLC

In research involving multiple tests of statistical hypotheses the efficiency of Monte Carlo (MC) simulation used to estimate the Type I error rate (p) is maximized using a two-step procedure. The first step is identifying the effects that are not statistically significant or ns. The second step of the procedure is verifying that remaining effects are statistically significant at the generalized or experimentwise criterion (p<0.05), necessary in order to reject the null hypothesis and accept the alternative hypothesis that a statistically significant effect occurred. This research uses experimental simulation to explore the ability of MegaODA to identify p values of 0.01 and 0.001, and sample sizes of n=100,000 and n=1,000,000. Solution speeds ranged from 5 to more than 83,000 CPU seconds running MegaODA software on a 3 GHz Intel Pentium D microcomputer. Using MegaODA it is straightforward to rapidly rule-in p<0.05 for weak and moderate effects by Monte Carlo simulation with large samples and BIG DATA in designs having ordinal attributes with or without weights applied to observations. Significantly greater time was required for problems involving continuous attributes but even the most computer-intensive analyses were completed in less than a day.

View journal article