Stats
Ratio of observations to independent variables (November 17, 2004). [Incomplete]
A widely quoted rule is that you need 10 or 15 observations per independent variable in a
regression model. The original source of this rule of thumb is difficult to find. I briefly
commented on this in an earlier weblog entry, but here is a more
complete elaboration.
When you are trying to build a regression model using a stepwise variable selection
process (or something similar to stepwise selection), there is substantial reason for
caution. Stepwise selection tends to lead to poor choices for the regression model that do
not replicate well. I abstracted some arguments against stepwise
variable selection as part of the STAT-L FAQ.
Frank Harrell did some empirical investigation of stepwise variable selection in the
logistic regression model and the Cox Proportional Hazards regression model. For these
models, it is not the number of observations you have, but the number of events that is
important. Suppose you study thousands of patients and find that in the control group four
die, but only two die in the treatment group. That represents a halving of the mortality
rate, yet no one would trust those results. Your sample size is effectively those six deaths
rather than the thousands of patients being studied.
This webpage was written on 2004-11-17 and was last modified on
2008-07-08. This page
needs minor revisions. Category:
Ask Professor Mean,
Category: Sample size justification