Stats
Re-weighting the data (January 25, 2005).
Category: Covariate adjustment
A recent article
- Two Statistical Paradoxes in the Interpretation of Group Differences:
Illustrated with Medical School Admission and Licensing Data. Wainer H,
Brown LM. The American Statistician 2004: 58(2); 117-23.
shows how a simple re-weighting of the data can lead to a fairer comparison
between two groups. An expanded version of this paper is available on the
Statistical Literacy website.
Wainer and Brown show data on a state by state basis for the National
Assessment of Educational Progress (NAEP). Two states, Nebraska and New
Jersey show interesting results. The average score for Nebraska is 277 and is
only 271 for New Jersey. But interestingly enough, New Jersey outperforms
Nebraska among whites (283 vs 281), blacks (242 vs 236) and other non-white
(260 vs 259).
This is an example of Simpson's paradox. New Jersey has much different
demographics than Nebraska. In New Jersey 66% of the population is white, 15%
black, and 19% other non-white. In Nebraska, 87% of the population is white,
5% is black, and 8% is other non-white. It is this differing demographic mix
that causes the paradox.
The average score for each state is a weighted average. For Nebraska, the
calculation is
281*0.87 + 236*0.05 + 259*0.08 = 277
and for New Jersey, the calculation is
283*0.66 + 242*0.15 + 260*0.19 = 272
Nebraska benefits because a higher weight (0.87) is placed on the race that
scored highest in both states. What would happen to Nebraska's and New
Jersey's scores if the demographic mix was changed to the overall percentages
in the U.S. (69% white, 16% black, and 15% other non-white)?
Here are the re-weighted calculations for Nebraska
281*0.69 + 236*0.16 + 259*0.15 = 271
and New Jersey
283*0.69 + 242*0.16 + 260*0.15 = 273
This re-weighting to a common demographic risk is often used to make
adjustments between two groups that have sharply differing mixes of age,
gender, and/or racial characteristics.
Here are a few additional references about Simpson's paradox.
- Appleton, David R., French, Joyce M. and Vanderpump, Mark P. J. (1996)
Ignoring a covariate: An example of Simpson's paradox. The American
Statistician, 50, 340-341.
- Wagner, Clifford H. (1982) Simpson's paradox in real life. The American
Statistician, 36, 46-48.
- Morrell, Christopher H. (1999) Simpson's paradox: An example from a
longitudinal study in South Africa. Journal of Statistics Education, 7, 7-7.
This web page was written and was last modified on
09/24/2007.