Stats
One-tailed p-values (April 12, 2004)
Someone asked me how to compute one-sided p-values in SPSS. The output from
SPSS always uses two-sided p-values. This was worth an explanation, so I
added a new question to the Ask Professor Mean page
on how to do this.
There is a fierce debate about when you should use one-sided tests. In an
editorial in the Journal of Clinical Epidemiology
[Medline], Knotttnerus and Bouter argue in favor of one-sided tests
when it is obvious that a statistically significant change in the opposite
direction would not influence practice any differently than a finding of no
statistical difference. The sample size for a one sided test is smaller and
this places a smaller burden on research subjects. Typically this would be
when one approach is more expensive, more risky, or otherwise more
troublesome. A finding of no difference between a new surgical approach an
existing non-surgical approach, would lead to a recommendation for the
non-surgical approach because it is less invasive. A comparison of coumarin
and aspirin in preventing stroke would recommend aspirin if there were no
demonstrated difference, because coumarin requires repeated blood testing.
Knottnerus and Bouter also recommend one-sided tests when the comparison
group is no treatment or an inactive placebo.
Lemuel Moye, in his fascinating book Statistical Reasoning in Medicine.
The Intuitive P-Value Primer, argues forcefully against one-sided tests.
Researchers should be open to the possibility that their proposed treatments
could do more harm than good. Moye offers a compromise where the alpha level
is allocated asymmetrically. For example, the "benefit" tail could be
allocated .03 of the total alpha level and the remaining .02 would be
allocated to the "harm" tail.
Another example of where a one-sided test is called for is when the
possibility of a change in the opposite direction would be immediately
disregarded. One example where you might think that there is an obvious need
for a one-tailed test is in the study of passive smoking. One writer did try
to argue that passive smoking has a beneficial effect, as described in the
1998 BMJ article, The hot air on passive smoking
[Medline], but most accept that if it has any effect at all, it would be
harmful.
And yet, a federal judge vacated a U.S. Environmental Protection Agency
report on passive smoking partly because of the use of one-sided tests. Judge
Osteen ruled that
Finally, when an agency conducts activities under an act authorizing
information collection and dissemination of findings, the agency has a duty
to disseminate the findings made. EPA did not disclose in the record or in
the Assessment: its inability to demonstrate a statistically significant
relationship under normal methodology; the reasoning behind adopting a
one-tailed test, or that only after adjusting the Agency's methodology
could a weak relative risk be demonstrated. Instead of disclosing
information, the Agency withheld significant portions of its findings and
reasoning in striving to confirm its a priori hypothesis.
So apparently, Judge Osteen believes it is normal for an enforcement agency
to examine the potential that some of the hazardous substances it evaluates
might actually be helpful. In my opinion, it would be absurd for any
scientist who claimed that exposure to passive smoke might have a protective
effect against lung cancer. You can find the
full text of Judge
Osteen's ruling at the Junk Science web site. The EPA has a
response to the general
criticisms of its report on passive smoking (though not a direct response
to Judge Osteen's ruling) on its web page.
Finally, there's a lot of controversy about whether p-values should be used
at all. I'd like to write a web page about this sometime, but for now, here
are some references and web pages.
- The Case Against Statistical Significance Testing. Carver RP.
Harvard Educational Review 1978: 48(3); 378-399.
- The Earth Is Round (p < .05). Cohen J. American Psychologist
1994: 49(12); 997 - 1003.
- p Values, hypothesis tests, and likelihood: implications for
epidemiology of a neglected historical debate. Goodman S. American
Journal of Epidemiology 1993: 137(5); 485-95.
[Medline]
- A Picture is Worth a Thousand p Values: On the Irrelevance of
Hypothesis Testing in the Microcomputer Age. Loftus GR. Behavior
Research Methods, Instruments & Computers 1993: 25(2); 250-256.
- Is statistical significance testing useful in interpreting data?
Savitz DA. Reprod Toxicol 1993: 7(2); 95-100.
[Medline]
- Sifting the evidence- what's wrong with significance tests?
Sterne JAC, Smith GD. BMJ 2001: 322; 226-231.
[Medline] [Full
text] [PDF]
-
Special Issue: Statistical Significance Testing. Roberts D,
Penn State University. Accessed on 2003-03-20. roberts.ed.psu.edu/users/droberts/sigtest.htm
-
Understanding P-values. Berger J, Duke University. Accessed on
2003-03-19. www.stat.duke.edu/~berger/p-values.html
-
326 Articles/Books Questioning the Indiscriminate Use of Statistical
Hypothesis Tests in Observational Studies. Thompson WL.
Accessed on 2003-03-19. www.cnr.colostate.edu/~anderson/thompson1.html
07/08/2008.
Category: Pvalues