Stats
Confidence interval for a correlation coefficient (July 11, 2005).
In many exploratory research studies, the goal is to examine
associations among multiple demographic variables and some outcome variables. How can you
justify the sample size for such an exploratory study? There are several approaches, but one
simple way that I often use is to show that any correlation coefficients estimated by this
research study will have reasonable precision. It may not be the most rigorous way to select
a sample size, but it is convenient and easy to understand.
I wrote a
simple spreadsheet that will use the Fisher's z transformation to estimate 95% confidence
intervals for a correlation.
Stan Brown has a nice description of the Fisher's z transformation at
Here's an example of how it might work. In a study of various biomarkers
for bone damage, several measures that are considered preliminary or experimental are
compared to other more established measures. You might want to examine the correlations among
the preliminary and established measures. A sample size of 300 allows you to estimate these
correlations with a high level of precision. For example, if the correlation were estimated
to be 0.6, the 95% confidence interval from a sample of 300 patients would range from 0.52 to
0.67, which is very precise.
In contrast, a sample of size 30 would offer only a moderate level of
precision. If the correlation were estimated to be 0.6, the 95% confidence interval from a
sample of 30 patients would range from 0.31 to 0.79. That level of precision might be
acceptable if you only wanted to make broad and general statements about the associations
among variables and didn't need to narrow the correlation to a precise interval.
A sample size of 10 is clearly inadequate for a study using
correlations. If the correlation were estimated to be 0.6, the 95% confidence interval would
range from -0.05 to 0.89, an interval that is too wide to be useful. It could not distinguish
between a strong correlation (0.8), a weak correlation (0.4), or even a zero correlation. If
you only expected to see very strong correlations, it might be okay. For example, if the
correlation were 0.9, then the 95% confidence interval from a sample of 10 patients would
range from 0.62 to 0.98. In some studies this might be an acceptable level of precision. But
how often would you expect to only see correlations of 0.9 and higher?
The nice thing about a spreadsheet is that you can play a lot of "what
if" games and explore how changes in the correlation and/or the sample size influence the
width of the confidence interval.
07/08/2008.
Category: Confidence intervals,
Category: Sample size justification