Stats
Why the plus one in the percentile formula p(n+1)? (June 22, 2007).
Category: Ask Professor Mean, Category:
Descriptive statistics
Dear Professor Mean, I was reviewing your page on the interquartile range and was
wondering why the formula for the quartiles in particular and percentiles in general asks you
to select the p(n+1) observation. Why do you need to add one?
The glib answer is that we need to make up for the deficit that we created when we defined
the degrees of freedom for the standard deviation to be n-1.
Actually, there is more than one formula that works and there is no perfect consensus,
especially for the definition of quartiles.
One intuitive answer is that the average of the numbers 1 through n is not n/2 but rather
(n+1)/2. So this gives you a hint that simply using p*n would produce values that are
slightly too small.
Another intuitive answer is that p(n+1) enforces some symmetry to the problem, so that the
percentiles from the upper end match the percentiles from the lower end. Suppose you wanted
to compute the 25th and 75th percentiles of a set of six numbers. If you used the formula pn,
this would produce values of 6*0.25=1.5 and 6*0.75=4.5. So you would choose halfway between
the first and second value for the 25th percentile, and halfway between the 4th and 5th
values for the 75th percentiles. So this definition would be lopsided in that the 25th
percentile used the smallest value as part of the calculation, but the 75th percentile did
not use the largest value as part of the calculation.
There are more technical justifications for adding one, but on a Friday afternoon, I
prefer a less technical justification.
07/08/2008.