Stats
When the F test is significant, but Tukey is not (September 9, 2005)
Someone asked me how to interpret a one factor analysis of variance where
the overall F test was significant, but the Tukey folloup test comparing all
four group means was not significant for any pair of means. The short answer
is to report that the F test was significant and that Tukey was not. Point
out that the Tukey follow-up test is conservative because it attempts to
control the overall alpha level. Most people understand what a conservative
test is and they will accept that interpretation.
I suspect that one or more of the pairwise comparisons was borderline, so
you might talk about that also. Or you could look at the unadjusted
comparisons and one of those almost has to be significant. These findings of
course need to be interpreted with caution, because they were not
statistically significant using a more conservative criterion.
Here's the longer and more technical answer. The F test examines whether
all four means are equal and the alternative that is frequently used, that at
least one pair of means differs, is not quite accurate. The alternative is
really that there is a linear contrast among the four means that is
significantly different from zero. A pairwise difference is one example of a
linear contrast, but there are other linear contrasts that Tukey does not
look at.
For example it might be that the first mean does not differ significantly
from the third mean and the second mean does not differ significantly from
the fourth mean, but maybe an average of the first and second means differs
significantly from an average of the third and fourth means. Or maybe the
fourth mean is slightly smaller than the other means, but not enough to be
statistically significant for any pair. But when you average the other three
means, you get enough precision to get a statistically significant difference
from the fourth mean. Perhaps it would be worthwhile to search for an
interesting contrast of the means that differs from zero.
Another technical difficulty is that Tukey's test is based on the
studentized range which behaves slightly differently than the F test. There
are situations (not too many) where the studentized range statistic is
significant, but the F test is not. And there are situations (again not too
many) where the F test is significant and the studentized range statistic is
not. You just have the bad luck of encountering one of those rare situations.
Now I don't think it helps to make such technical arguments in a paper, so it
is just simpler to admit that there is a discrepancy and blame the
conservative nature of the Tukey followup test.
Further reading
-
http://core.ecu.edu/psyc/wuenschk/StatHelp/Multiple-Comparisons.txt
07/14/2008.
Category: Analysis of variance