Novometric Statistical Analysis and the Pearson-Yule Debate

Paul R. Yarnold

Optimal Data Analysis, LLC

Karl Pearson and George Yule debated the validity of the assumptions made when estimating the association between cross-classified vaccine (or antitoxin) administration and mortality, if both variables are assessed on binary measurement scales. Pearson assumed these measures reflect an inherently continuous distribution and used tetrachoric correlation to estimate their association, and Yule assumed these measures reflect an inherently discrete distribution and used the phi coefficient to estimate their association. Novometric statistical analysis (NSA) may be used in either circumstance: measurement scales may be treated as categorical or ordered, motivated by a priori substantive theoretical or empirical consideration; exploratory or confirmatory hypotheses may be tested; and observations may be analytically weighted.

View journal article

ODA vs. π and κ: Paradoxes of Kappa

Paul R. Yarnold

Optimal Data Analysis, LLC

Widely-used indexes of inter-rater or inter-method agreement, π and κ sometimes produce unexpected results called the paradoxes of kappa. For example, prior research obtained four legacy agreement statistics (κ, Scott’s π, G-index, Fleiss’s generalized π) for a 2×2 table in which two independent raters failed to jointly classify any observations into the “negative” rating-class category: two indexes reported > 88.8% overall agreement and the other two reported < -2.3% overall agreement. ODA sheds new light on this paradox by testing confirmatory and exploratory hypotheses for these data, separately modeling the ratings made by each rater, and separately maximizing model predictive accuracy normed for chance (ESS; 0=inter-rater agreement expected by chance, 100=perfect agreement) as well as model overall accuracy that is not normed for chance (PAC; 0=no inter-rater agreement, 100=perfect agreement).

View journal article

Novometric Analysis vs. EO-CTA: Disentangling Sets of Sign-Test-Based Multiple-Comparison Findings

Paul R. Yarnold

Optimal Data Analysis, LLC

Prior empirical comparison of the timeline follow-back (TLFB, dummy-coded as 1) vs. Drinker Profile (DP, coded as 2) methods of quantifying alcohol consumption in treatment research reported pairwise sign tests comparing these methods separately on four categorical ordinal outcomes: abstinent=1; light=2; moderate=3; heavy=4. It was concluded: “The direction of differences for the abstinent and medium categories approached significance (with unprotected alpha criterion at .05) with the DP more often yielding higher estimates of abstinent days and lower estimates of medium days. The DP significantly more often yielded lower estimates of light days”. This example is used to illustrate and compare the use of CTA vs. novometric analysis to disentangle sets of pairwise comparison outcomes.

View journal article

CTA vs. Non-Disentangled Omnibus Chi-Square: Comparing Samples (Not) Selected for Study Participation

Paul R. Yarnold

Optimal Data Analysis, LLC

An empirical comparison of timeline follow-back vs. averaging methods for quantifying alcohol consumption in treatment research reported non-disentangled, non-interpreted chi-square-based comparisons of factors differentiating subjects selected vs. not selected for participation in the study. CTA easily identifies underlying inter-study differences.

View journal article

Predicting Daily Television Viewing of Senior Citizens Using Education, Age and Marital Status

Paul R. Yarnold

Optimal Data Analysis, LLC

Daily television viewing (hours, 6-minute increments), marital status (0=not married; 1=married), age and education (years, integers) data were obtained for a randomly-selected sample of 25 senior citizens. Training analysis predicting viewing (dependent variable) as a linear function of the other (independent) measures by multiple regression analysis identified a statistically significant omnibus effect: F(3,21)=11.7, p<0.0001, R-squared=0.626. Partial F (variable entered last) statistics indicated that marital status [F(1,21)=13.9, p<0.0008] and education [F(1,21)=9.2, p<0.0062] had statistically reliable negative relationships with viewing hours. This model accounted for 5/8 of the variance in television viewing time, however it was unable to make statistically reliable point predictions of viewing times: ESS=7.1, D=194.9, ns. The globally-optimal (GO) novometric model predicting viewing times was: if education<=13 years, then predict viewing 0.7 hours. Training performance (stable in jackknife analysis) was very strong: ESS=90.1, D=0.20, p<0.0001. The model correctly classified 3 of 3 observations having 0.7 or fewer viewing hours, and 20 of 22 (90.1%) with 0.8 or more daily viewing hours.

View journal article

Novometric Models of Smoking Habits of Male and Female Friends of American College Undergraduates: Gender, Smoking, and Ethnicity

Paul R. Yarnold

Optimal Data Analysis, LLC

Novometric statistical analyses were used to model smoking habits of one’s male friends, and of one’s female friends, for samples of 3,289 Anglo-American, 944 Mexican-American, and 733 Indian-American college undergraduates. For both analyses the categorical attributes were ethnicity (a multicategorical attribute, dummy-coded using 1-3, respectively), and subject gender (0=female, 1=male) and smoking behavior (0=non-smoker, 1=smoker). The novometric findings are compared with results originally reported for this application obtained using disintegrated chi-square analysis.

View journal article

Would One’s Best Boy- or Girl-Friend be More Upset if One Began Smoking: An Exploratory GenODA Model for Anglo-, Mexican-, and Indian-American College Undergraduates

Paul R. Yarnold

Optimal Data Analysis, LLC

Samples of 1,171 male and 1,503 female Anglo-American, 291 male and 503 female Mexican-American, and 138 male and 361 female Indian-American, non-smoking college undergraduates were asked if their best boy-friend or their best girl-friend would be most upset if the subject began smoking. Original analysis using separate chi-square analyses (one design cell for the Indian-American students violated the minimum expectation assumption) concluded: “While the influence of boy-friends or girl-friends on their smoking or non-smoking partners seemed to be rather small, the opposite-sex friend was invariably perceived to be more upset by the possibility of the respondent’s taking up the habit: all these differences were significant beyond the .01 level”. An exploratory GenODA analysis was conducted treating ethnicity as the Gen variable: an ODA model is identified that, when simultaneously and independently applied to each of the Gen groups (dummy-coded as 1-3), explicitly maximizes the lowest ESS obtained across all of the Gen groups. Here the subject’s gender is the class variable, and the gender of one’s most-affected friend is a categorical attribute (gender variables were dummy-coded: female=0, male=1). The omnibus GenODA model was: if Friend=female, predict subject gender=male; otherwise predict subject gender=female: p<0.0001, strong ESS=77.7 (84.9% of actual female and 92.8% of actual male subjects were correctly classified). The GenODA model performed comparably for the Anglo-, Mexican-, and Indian-American samples: all p’s< 0.0001; strong ESS=77.3, 81.3, and 75.1, respectively.

View journal article

Using Gender of an Imaginary Rated Smoker, and Subject’s Gender, Ethnicity, and Smoking Behavior to Identify Perceived Differences in Peer-Group Smoking Standards of American High School Students

Paul R. Yarnold

Optimal Data Analysis, LLC

Novometric analysis is used to discriminate perceived peer-group standards for girls seeing boys smoke, vs. for boys seeing girls smoke (class variable), of 3,220 Anglo-American, 936 Mexican-American, and 723 Indian-American (multicategorical attribute) high-school students. Subjects rated their opinion about boys seeing girls smoke, and about girls seeing boys smoke, using a three-point categorical ordinal scale (ordered attribute): approve, do not care, disapprove (coded using 3-1, respectively). Additional categorical attributes were nominal measures of gender and whether or not the subject smoked. The globally optimal model in this application selected only approval rating as an attribute (relatively weak ESS=21.4; D=7.3; p<0.001): 76.3% of ratings of girl smokers indicated disapproval, compared with 54.9% of boy smokers..

View journal article

Parental Smoking Behavior, Ethnicity, Gender, and the Cigarette Smoking Behavior of High School Students

Paul R. Yarnold

Optimal Data Analysis, LLC

Novometric analysis is used to predict cigarette smoking (class variable) of 3,577 Anglo-American, 1,001 Mexican-American, and 797 Indian-American (multicategorical attribute) high-school students. Additional categorical attributes used were gender, and dummy-variable indicators of whether the student’s mother, and whether the student’s father, did or did not smoke. The globally optimal model in this application used only gender as an attribute: moderate ESS=27.1; D=5.4; p<0.001. The next more-complex model in the descendant family (moderate ESS=29.7; D=11.8; p’s<0.001) identified five student strata: female Indian-American students had the highest smoking rate (45.6% of 390 students), and male students for whom neither parent smoked had the lowest smoking rate (14.9% of 564 students).

View journal article

Assessing Hold-Out Validity of Models of Smoking Behavior Developed for Male Anglo-American College Undergraduates Applied to Classify Comparable Mexican-American and Indian-American Samples

Paul R. Yarnold

Optimal Data Analysis, LLC

Hold-out validity analysis is used to evaluate the cross-generalizability of relationships previously identified by novometric models predicting year in school (ordered class variable) as a function of cigarette smoking behavior (attribute, measured using an ordered and a multicategorical measurement scale) for a sample of 3,809 male Anglo-American college undergraduates. As a test of the confirmatory hypothesis that the Anglo-American models will cross-generalize, those models are used to classify comparable samples of 1,112 Mexican-American and 985 Indian-American college undergraduate males.

View journal article

Novometrics vs. ODA vs. One-Way ANOVA: Evaluating Comparative Effectiveness of Sales Training Programs, and the Importance of Conducting LOO with Small Samples

Paul R. Yarnold

Optimal Data Analysis, LLC

Immediately after graduating from one of four alternative sales training programs, graduates were randomly assigned to sales areas putatively having comparable sales opportunities: number of sales made by each of N=23 graduates at the end of their first week was recorded. Analysis by one-way ANOVA yielded F(3,19)=3.13, p<0.0281. It was concluded: “…evidence is sufficient to indicate a difference in mean achievement for the four training programs” (p. 383). If the omnibus effect (comparing all of the groups simultaneously) effect has p<0.05, then all-possible pairwise comparisons (or a more efficient range test procedure) are used to disentangle the omnibus effect and identify the statistically significant inter-group differences. This was not reported, but the combination of a test of a non-directional hypothesis (the anticipated relative ordering of mean sales by group was not specified a priori), in conjunction with the small sample and associated weak statistical power, limit the detectable effects to those reflecting extremely strong inter-group differences. Non-directional ODA treating group as the class variable and sales as the ordered attribute was unable to identify a statistically reliable model for discriminating all four sales groups (ESS=42.46, D=5.42, p<0.32). A single novometric model emerged: if sales<87.5 then predict group<4; otherwise predict group=4. Model performance in total sample analysis was relatively strong and statistically reliable: ESS=69.74, D=0.87, p<0.042 (sensitivity for group 4=75.00%, for groups 1-3=94.74%). Jackknife analysis suggested the effect may not cross-generalize if the model is used to classify different samples of graduates: ESS=43.42, D=2.61, p<0.015 (sensitivity for group 4=75.00%, for groups 1-3=68.42%).

View journal article

CTA vs. Chi-Square: Comparing Voter Sentiment in Political Wards

Paul R. Yarnold

Optimal Data Analysis, LLC

Random samples of 200 registered voters from each of four political wards were asked if they favored a particular candidate. Seven chi-square analyses (one omnibus comparison between all four wards, six follow-up pair-wise comparisons to specify the underlying effect) were used to compare the proportion of voters favoring the candidate between wards. Evaluating results at either the generalized or the experimentwise criterion for statistical significance, chi-square found the omnibus effect, and two pairwise comparisons: Ward 1 > Ward 2, and Ward 1 > Ward 4. In contrast, a single CTA analysis was conducted predicting voter sentiment (treated as the class variable, and coded as 1 if the voter favors the candidate, or 0 otherwise) with ward (dummy-coded as 1-4) treated as a multicategorical attribute. A single model emerged: if Ward=1, then predict the voter favors the candidate; otherwise predict the voter does not favor the candidate (p<0.036). The training (total sample) effect was relatively weak (ESS=10.2), and the predictive accuracy declined to levels worse than expected by chance (ESS= -14.8) in jackknife analysis. CTA thus revealed that the most accurate model possible for this application is weak, and that there is evidence that the model may not cross-generalize if it is used to classify independent random samples.

View journal article

ODA vs. Undocumented Chi-Square: Clarity vs. Confusion

Paul R. Yarnold

Optimal Data Analysis, LLC

A longitudinal smoking cessation study followed three patient groups: group 1=40 patients attending at least one group session; group 2=62 interviewed patients who did not attend group sessions; group 3=group 1+group 2. Data were collected four times: at the interview, and two- weeks and one- and two-months post-discharge. Each time patients rated their smoking behavior using a six-point scale: 1=quit smoking; 2=reduced smoking >50%; 3=reduced smoking <50%; 4=switched to pipe; 5=no change; 6=smoking increased. The scale is linear—assessing monotonically decreasing smoking—except for response option 4 (with N<3 at all four testings). Option 4 made the response scale nonlinear so the author used chi-square analysis to compare groups within and across testings (the latter test is a violation of the chi-square assumption that all observations appear once in the design matrix). Nevertheless, N<3 for Option 4 causes violation of the minimum expectation assumption. Furthermore, were an omnibus effect to be identified then the pairwise comparisons needed to identify the differences that produced the effect would also violate the minimum expectation assumption. The author offered two general statements regarding undocumented chi-square-based findings that violated two crucial underlying assumptions, and offered qualitative discussion. This is a surprisingly common practice in studies involving multicategorical variables that are analyzed using chi-square analysis: ODA clarifies the findings in such applications.

View journal article

CTA vs. Disintegrated Chi-Square: Integrated vs. Piecemeal Analysis

Paul R. Yarnold

Optimal Data Analysis, LLC

Experimental designs for which log-linear analysis is the recommended legacy (maximum-likelihood) methodology are usually inappropriately analyzed vis-à-vis series of chi-square analyses conducted on assorted subtables. Disintegrated chi-square analysis is compared with CTA for an application relating physician support and desired smoking status to actual smoking behavior.

View journal article

CTA vs. Not Chi-Square: Fear and Specific Recommendations Do Synergistically Affect Behavior

Paul R. Yarnold

Optimal Data Analysis, LLC

Classic research tested the a priori hypothesis that fear and specificity of recommendation synergistically influence a person’s decision to have a tetanus inoculation. Data were inappropriate for analysis by one-way chi-square, and results obtained using chi-square missed the hypothesized interaction: “…specific plans for action influence behavior while level of fear does not” (p. 27). In contrast, for this design CTA identified a relatively strong, statistically reliable, and cross-generalizable two-strata model that supported the hypothesized interaction. The underlying design was then restructured as a 2 (Fear) x 2 (Recommendation) between-subjects factorial. Data for this design were also inappropriate for analysis by chi-square, but CTA found a relatively strong, statistically reliable, reproducible three-strata model that supported the hypothesis.

View journal article