Stats
Statistical Evidence. Overview.
There's an enormous mistrust of statistics in the real world. To the extent
that it makes people skeptical, that's good. To the extent it turns them
cynical, that's bad. There's a viewpoint, championed by too many people, that
statistics are worthless. I call this viewpoint statistical nihilism. Here's
an instructive example.
The paradigm of evidence-based medicine now being proposed is nothing
but the thinly disguised worship of statistical methods and techniques. The
value and worth of nearly all medications of proven effectiveness were
developed without the benefits of statistical tools, to wit, digitalis,
colchicine, aspirin, penicillin, and so on. Statistical analyses only
demonstrate levels of numeric association or, at best, impart a numeric
dimension to the level of confidence ' or lack thereof ' that chance
contributed to the shape and distribution of the data set under
consideration. Statistical association cannot replace causal
relation'which, in the final analysis, is the bedrock on which good medical
practice must rest. -- (Boba
1998)
There are a lot more examples out there. Usually, people who adopt
statistical nihilism have an axe to grind. In their minds, there's a problem
with most of the research in a certain area, and rather than attack the
research directly, they try to undermine the research by citing all the flaws
in the statistical methodology. Of course, you can always find flaws in any
research including in the statistical methodology. The perfect statistical
analysis has yet to be performed.
What's missing among these statistical nihilists is a sense of proportion.
Some statistical flaws are so serious as to invalidate the research. Other
flaws raise enough concern that you should demand additional corroborating
evidence (such as replication of the study). Other flaws are mere trifles.
If you are a nihilist, life is easy. Just keep a list of statistical flaws
handy and one of them is bound to apply to the research study that you
dislike.
The real world, of course, is much more complex. Medical care givers do
indeed change their practices in response to the publication of well designed
research studies. These changes follow extensive debate and careful review of
all the evidence*.
Research has also showed that adults who take a daily dose of aspirin can
reduce their risk of heart attacks and strokes (Physicians'
Health Study Research Group 1989). The Women's Health Initiative
published findings (Rossouw
2002) that indicated that hormone replacement therapy in postmenopausal
women may actually be harmful rather than helpful. This followed a couple of
other studies (Hulley
1998;
Herrington 2000) that laid the seeds of doubt about this practice.
Another spectacular failure that was discovered through careful research was
that drugs that suppress cardiac arryhtmias may actually increase mortality (Epstein
1993).
On the other hand, it helps to recognize and be constantly vigilant for the
many limitations in medical research. A large number of review articles have
demonstrated that the publications in many medical disciplines have serious
limitations and leave much room for improvement. One of the best examples is
a large scale review by Ben Thornley and Clive Adams of research on
schizophrenia (Thornley
1998). You can find the full text of this article on the web at
bmj.com/cgi/content/full/317/7167/1181
and it is well worth reading. Thornley and Adams looked at the quality of
clinical trials for treating schizophrenia. Since they work for the Cochrane
Collaboration Group, a group that provides systematic reviews of the results
of medical trials, they are in a good position to write such an article.
Thornley and Adams actually identified over 2500 studies of schizophrenia,
but decided to summarize only the first 2000 that they uncovered. Perhaps
they reached the point of sheer exhaustion. I am very impressed at the amount
of work this must have taken.
The research covered fifty years, starting in 1948 through 1997. The
research covered a variety of therapies: drug therapies, psychotherapy,
policy or care packages, or physical interventions like electroconvulsive
therapy.
What did Thornley and Adams find? It wasn't a pretty picture. First,
researchers in schizophrenia studied the wrong patients. Most studies used
institutionalized patients, who are easier to recruit and follow up with, but
who do not provide a good representation of the all patients with
schizophrenia. Readers would probably be interested as much in community
based studies, if not more interested, but only 14% of the studies were
community based. From the perspective of the researchers, of course, it is a
whole lot easier to use institutionalized patients, because if they don't
show up for their six month evaluation, you know where to find them.
Second, the researchers also did not study enough patients. Thornley and
Adams estimated that a good study of schizophrenia should have at least 300
patients in each group. This would be based on rates of improvements that
might be expected for an active drug compared to placebo effects. Even though
the desired sample size was 300, it turns out that the average study had only
65. Only 3% of the studies had 300 or more patients. From the perspective of
researchers, it is a whole lot easier to study to study a small number of
patients because you can finish the publication with less effort and money.
Third, the researchers did not study the patients long enough. A good study
of schizophrenia should last for six months or more; long term changes are
more important than short term changes. Unfortunately, more than half of the
studies lasted for six weeks or less. From the perspective of the
researchers, it is a whole lot easier to focus on short term outcomes because
you can finish the study a lot faster.
Finally, the researchers did not measure these patients consistently. In
the 2,000 studies, the researchers used 640 ways to measure the impact of the
interventions. Granted, there are a lot of dimensions to the schizophrenia
and there were measures of symptoms, behavior, cognitive functioning, side
effects, social functioning, and so forth. Still, there is no justification
for using so many different measurements. Imagine how hard this makes it for
anyone to summarize the results of this research. Failure to use and re-use a
few standardized assessments has led to a very fragmentary (dare I say,
schizophrenic) picture about schizophrenia treatments.
Like all the previous problems, this can be explained from the perspective
of convenience. It is a whole lot easier to develop your own outcome measure
than to try to adapt somebody else's.
This publication suggest that a big problem with medical research is that
the researchers have a strong tendency to conduct research that is easy to
do. The research that is relevant to practicing clinicians is much harder.
This is hardly surprising. Research on schizophrenia is especially hard to do
well. Can you imagine trying to discuss an informed consent document with
patients who suffers from schizophrenia?
I don't want this example to turn you into a statistical nihilist, though.
The take home message from Thornley and Adams is that just because the
research is peer-reviewed does not mean that it is perfect. I hope it helps
you identify factors that limit the quality of peer-reviewed research.
If you practice medicine intelligently, you have to incorporate some
research studies into your clinical practice and disregard other studies.
Which studies do you incorporate? It depends on the quality of evidence in
the article. Was there a good comparison group? How were dropouts and
exclusions handled? Did they measure the outcome variable well? What other
corroborating evidence is there? Those are questions that I will address in
the rest of the book.
Footnotes
* The following examples are drawn mostly from a web site that Benjamin
Djulbegovic developed on randomized trials that changed medical practice based
on comments he received on the Evidence Based Health email discussion group.
You can find even more good examples at www.hsc.usf.edu/~bdjulbeg/oncology/RCT-practice-change.htm.
This webpage was written on 2005-06-03
and was last modified on
2008-07-08.
Category: Statistical evidence