Using Gender of an Imaginary Rated Smoker, and Subject’s Gender, Ethnicity, and Smoking Behavior to Identify Perceived Differences in Peer-Group Smoking Standards of American High School Students

Paul R. Yarnold

Optimal Data Analysis, LLC

Novometric analysis is used to discriminate perceived peer-group standards for girls seeing boys smoke, vs. for boys seeing girls smoke (class variable), of 3,220 Anglo-American, 936 Mexican-American, and 723 Indian-American (multicategorical attribute) high-school students. Subjects rated their opinion about boys seeing girls smoke, and about girls seeing boys smoke, using a three-point categorical ordinal scale (ordered attribute): approve, do not care, disapprove (coded using 3-1, respectively). Additional categorical attributes were nominal measures of gender and whether or not the subject smoked. The globally optimal model in this application selected only approval rating as an attribute (relatively weak ESS=21.4; D=7.3; p<0.001): 76.3% of ratings of girl smokers indicated disapproval, compared with 54.9% of boy smokers..

View journal article

Parental Smoking Behavior, Ethnicity, Gender, and the Cigarette Smoking Behavior of High School Students

Paul R. Yarnold

Optimal Data Analysis, LLC

Novometric analysis is used to predict cigarette smoking (class variable) of 3,577 Anglo-American, 1,001 Mexican-American, and 797 Indian-American (multicategorical attribute) high-school students. Additional categorical attributes used were gender, and dummy-variable indicators of whether the student’s mother, and whether the student’s father, did or did not smoke. The globally optimal model in this application used only gender as an attribute: moderate ESS=27.1; D=5.4; p<0.001. The next more-complex model in the descendant family (moderate ESS=29.7; D=11.8; p’s<0.001) identified five student strata: female Indian-American students had the highest smoking rate (45.6% of 390 students), and male students for whom neither parent smoked had the lowest smoking rate (14.9% of 564 students).

View journal article

Assessing Hold-Out Validity of Models of Smoking Behavior Developed for Male Anglo-American College Undergraduates Applied to Classify Comparable Mexican-American and Indian-American Samples

Paul R. Yarnold

Optimal Data Analysis, LLC

Hold-out validity analysis is used to evaluate the cross-generalizability of relationships previously identified by novometric models predicting year in school (ordered class variable) as a function of cigarette smoking behavior (attribute, measured using an ordered and a multicategorical measurement scale) for a sample of 3,809 male Anglo-American college undergraduates. As a test of the confirmatory hypothesis that the Anglo-American models will cross-generalize, those models are used to classify comparable samples of 1,112 Mexican-American and 985 Indian-American college undergraduate males.

View journal article

Novometrics vs. ODA vs. One-Way ANOVA: Evaluating Comparative Effectiveness of Sales Training Programs, and the Importance of Conducting LOO with Small Samples

Paul R. Yarnold

Optimal Data Analysis, LLC

Immediately after graduating from one of four alternative sales training programs, graduates were randomly assigned to sales areas putatively having comparable sales opportunities: number of sales made by each of N=27 graduates at the end of their first week was recorded. Analysis by one-way ANOVA yielded F(3,19)=3.13, p<0.0281. It was concluded: “…evidence is sufficient to indicate a difference in mean achievement for the four training programs” (p. 383). If the omnibus effect (comparing all of the groups simultaneously) effect has p<0.05, then all-possible pairwise comparisons (or a more efficient range test procedure) are used to disentangle the omnibus effect and identify the statistically significant inter-group differences. This was not reported, but the combination of a test of a non-directional hypothesis (the anticipated relative ordering of mean sales by group was not specified a priori), in conjunction with the small sample and associated weak statistical power, limit the detectable effects to those reflecting extremely strong inter-group differences. Non-directional ODA treating group as the class variable and sales as the ordered attribute was unable to identify a statistically reliable model for discriminating all four sales groups (ESS=42.46, D=5.42, p<0.32). A single novometric model emerged: if sales<87.5 then predict group<4; otherwise predict group=4. Model performance in total sample analysis was relatively strong and statistically reliable: ESS=69.74, D=0.87, p<0.042 (sensitivity for group 4=75.00%, for groups 1-3=94.74%). Jackknife analysis suggested the effect may not cross-generalize if the model is used to classify different samples of graduates: ESS=43.42, D=2.61, p<0.015 (sensitivity for group 4=75.00%, for groups 1-3=68.42%).

View journal article

CTA vs. Chi-Square: Comparing Voter Sentiment in Political Wards

Paul R. Yarnold

Optimal Data Analysis, LLC

Random samples of 200 registered voters from each of four political wards were asked if they favored a particular candidate. Seven chi-square analyses (one omnibus comparison between all four wards, six follow-up pair-wise comparisons to specify the underlying effect) were used to compare the proportion of voters favoring the candidate between wards. Evaluating results at either the generalized or the experimentwise criterion for statistical significance, chi-square found the omnibus effect, and two pairwise comparisons: Ward 1 > Ward 2, and Ward 1 > Ward 4. In contrast, a single CTA analysis was conducted predicting voter sentiment (treated as the class variable, and coded as 1 if the voter favors the candidate, or 0 otherwise) with ward (dummy-coded as 1-4) treated as a multicategorical attribute. A single model emerged: if Ward=1, then predict the voter favors the candidate; otherwise predict the voter does not favor the candidate (p<0.036). The training (total sample) effect was relatively weak (ESS=10.2), and the predictive accuracy declined to levels worse than expected by chance (ESS= -14.8) in jackknife analysis. CTA thus revealed that the most accurate model possible for this application is weak, and that there is evidence that the model may not cross-generalize if it is used to classify independent random samples.

View journal article

ODA vs. Undocumented Chi-Square: Clarity vs. Confusion

Paul R. Yarnold

Optimal Data Analysis, LLC

A longitudinal smoking cessation study followed three patient groups: group 1=40 patients attending at least one group session; group 2=62 interviewed patients who did not attend group sessions; group 3=group 1+group 2. Data were collected four times: at the interview, and two- weeks and one- and two-months post-discharge. Each time patients rated their smoking behavior using a six-point scale: 1=quit smoking; 2=reduced smoking >50%; 3=reduced smoking <50%; 4=switched to pipe; 5=no change; 6=smoking increased. The scale is linear—assessing monotonically decreasing smoking—except for response option 4 (with N<3 at all four testings). Option 4 made the response scale nonlinear so the author used chi-square analysis to compare groups within and across testings (the latter test is a violation of the chi-square assumption that all observations appear once in the design matrix). Nevertheless, N<3 for Option 4 causes violation of the minimum expectation assumption. Furthermore, were an omnibus effect to be identified then the pairwise comparisons needed to identify the differences that produced the effect would also violate the minimum expectation assumption. The author offered two general statements regarding undocumented chi-square-based findings that violated two crucial underlying assumptions, and offered qualitative discussion. This is a surprisingly common practice in studies involving multicategorical variables that are analyzed using chi-square analysis: ODA clarifies the findings in such applications.

View journal article