Stats
Are we assuming a normal sample or a normal population?
Dear Professor Mean, I'm fitting an ANOVA model to a sample of 25
observations, and the data is quite skewed. I'm quite worried about this,
but my husband reassures me that this is not a problem. He says that the
assumption is that the population is normal, not the sample. Should I
listen to him?
As a husband myself, my every fiber wants to scream out YES, ALWAYS
LISTEN TO THE HUSBAND!! THE HUSBAND IS ALWAYS RIGHT!! But unfortunately,
I can't say this.
Your husband is technically, correct. The ANOVA model does indeed
assume that the population of residuals is normally distributed. There is
a possibility that the sample of residuals could indeed be skewed, but
still could come from a symmetric population. But how likely is this? You
can get a rough feel for this by taking repeated samples of 25 normally
distributed random variables and drawing a histogram. I did this below.

Notice a slightly skewed pattern once in a while (such as in the lower
right corner). But in general, the sample of 25 does not appear to have
marked departures from symmetry. There are other things that you see from
time to time like a bimodal pattern. If you've done enough data analysis,
you get used to seeing a few minor departures from normality and these
sorts of things don't really faze you. What is lacking in these graphs is
a marked departure from normality.
If your sample of 25 residuals shows a dramatic degree of skewness,
that's a fairly good indicator that the underlying population of
residuals is not normal.
Perhaps I can avoid problems with the HCDNWS (Husbands Can Do No Wrong
Society) by pointing out that the assumption of normality is not terribly
critical in most settings. This is because of the
Central Limit Theorem, which comforts us by
reminding us that even non-normal populations can produce reasonably
normal looking averages if the sample size is large enough. And it is the
distribution of the averages that influences the validity of ANOVA. How
large is large enough? There is no answer that works in all situations.
It depends a lot on how extremely different the population distribution
for individual items departs from normality.
This webpage was written on 2007-08-30 and was last modified on
2008-07-08.
Category: Ask Professor Mean,
Category: Modeling issues