Ariel Linden & Paul R. Yarnold

Linden Consulting Group, LLC & Optimal Data Analysis, LLC

Prior research contrasted the ability of different classification algorithms [logistic regression (LR), random forests (RF), boosted regression (BR), support vector machines (SVM), classification tree analysis (CTA)] to correctly fail to identify a relationship between a binary class (dependent) variable and ten randomly generated attributes (covariates): only CTA found no relationship. In this paper, using the same ten-variable N=1,000 dataset, a Weka multi-layer perceptron (MLP) neural net model using its default tuning parameters yielded (area under the curve) AUC=0.724 in training analysis, and AUC=0.507 in ten-fold cross-validation. With the exception of CTA, all machine-learning algorithms assessed thus far have identified training effects in completely random data.