![]() |
![]() |
![]() |
|
![]() |
|
![]() |
|
Statistical Evidence: Overview
This is an early draft of the overview for "Statistical Evidence."
"Still, it is an error to argue in front of your data. You find yourself insensibly twisting them around to fit your theories." Sherlock Holmes in The Adventure of Wisteria Lodge.
Reading medical research is hard work. I'm not talking about the medical terminology, though that is often quite bad (if I hear the word "emesis" one more time, I'm going to throw up!). The hard part is assessing the strength of the evidence. When you read a journal article, you have to decide if the authors present a case that is persuasive enough to get you to change your practice.
Some evidence is so strong that it stands on its own. Other evidence is weaker and requires support from other studies, from mechanistic arguments, and so forth. Still other evidence is so weak, that you should not consider any changes in your practice until the study is replicated using a more rigorous approach.
What should you look for?
When you are assessing the quality of the evidence, it's not how the data are analyzed that's important. Far more important is HOW THE DATA ARE COLLECTED. Don't agonize over whether the researchers should have used a non-parametric test or whether a random effects meta-analysis is appropriate (just to cite two obscure examples). These are important issues and they generate a lot of debate. But in most cases, the use of one statistical analysis or another is unlikely to make a substantial difference in the conclusions.
The more common and more important threat to the validity of the study relates to how the data are collected, not how they are analyzed. After all, if you collect the wrong data, it doesn't matter how fancy the analysis is. This is good news, because you don't need a lot of statistical training or a lot of mathematical sophistication to assess how the data are collected.
In this presentation, I want to show you what to look for and why. I will also highlight real research articles and use them as examples. Although all of the examples represent good and valuable research, some of the examples represent a level of evidence that by itself is less persuasive. It is helpful to understand why these examples are less persuasive.
Schizophrenic Research
Unfortunately, there is a lot of less than persuasive research out there. You don't have to look very hard to find solid empirical evidence of this. One of the best reviews documenting research problems was conducted by Ben Thornley and Clive Adams (BMJ 1998; 317(7167): 1181-4). Thornley and Adams looked at the quality of clinical trials for treating schizophrenia. Since they work for the Cochrane Collaboration Group, a group that provides systematic reviews of the results of medical trials, they are in a good position to write such an article.
Thornley and Adams actually identified over 2500 studies of schizophrenia, but decided to summarize only the first 2000 that they uncovered. I still am very impressed at the amount of work this must have taken.
The research covered fifty years, starting in 1948 through 1997. The research covered a variety of therapies: drug therapies, psychotherapy, policy or care packages, or physical interventions like electroconvulsive therapy.
What did Thornley and Adams find? It wasn't a pretty picture. First, researchers in schizophrenia studied the wrong patients. Most studies used institutionalized patients, who are easier to recruit and follow up with, but who do not provide a good representation of the typical patient with schizophrenia.
Second, the researchers also did not study enough patients. A good study of schizophrenia should have at least 300 patients in each group, but the average study had only 65.
Third, the researchers did not study the patients long enough. A good study of schizophrenia should last for six months or more. Unfortunately, more than half of the studies lasted for six weeks or less.
Finally, the researchers did not measure these patients properly. In the 2,000 studies, the researchers used 640 different outcome measures. This leads to a very fragmentary (dare I say, schizophrenic) picture and makes any rational summary of the research very difficult.
I don't wish to single out research in just this area. There are many reviews in other areas that also point out the flaws and shortcomings of research. Also keep in mind that research on schizophrenia is especially hard to do well. The take home message from Thornley and Adams is that just because the research is peer-reviewed does not mean that it is perfect.
Healthy Skepticism
Please don't panic. Research studies have many flaws but usually those flaws do not make the research wholly uninterruptible. These limitations should make you skeptical, perhaps, but not cynical.
The cynical attitude would be "you can prove anything with statistics" and leads to a nihilistic view that all research is garbage. The cynical attitude would lead you to nit pick a research paper, find a flaw here and a flaw there. Then use these flaws to disregard any research whose conclusions make you uncomfortable.
A skeptical attitude, on the other hand, would ask "how persuasive is this research" and would look at the strengths and the weaknesses of a research paper. It would place limits on how persuasive the research is. When the research was not sufficiently persuasive, a skeptical attitude would encourage you to think about what level of evidence would be enough to persuade you otherwise.
Who is this presentation for?
This presentation is for any health care professional who is making the effort to read and evaluate medical publications. Do you update and modify your clinical practice on the basis of what you read in the research journals? Then this presentation can help.
Non medical professionals can also benefit from this presentation. I do use a few technical medical terms, but as long as words like "myocardial infarction" don't give you a heart attack, you will be just fine. Indeed, many of us who do not have specialized medical training will still read medical journals. We need to critically assess their advice about diet and lifestyle changes.
Journalists can also benefit. If you write about medical innovations, you need an appreciation for the quality of the research about recent medical developments.
And while the focus of this presentation is on medical examples, the general principles apply to other areas as well. This presentation will help professionals of any discipline who need to follow research published in peer reviewed journals.
Who is this presentation not for?
This presentation cannot, however, be all things to all people. This presentation is not a substitute for evidence-based Medicine (EBM). The practice of EBM includes identifying the appropriate information sources, assessing those sources, and applying the results to your patients. This presentation does not tell you how to find the research articles, but it does tell you what to do, once you have them. If the journal articles come as part of an EBM search, this presentation can help you. But the articles can come from outside an EBM search as well. A colleague or one of your patients may just drop a journal article in your lap and want your opinion.
Also, this presentation is tailored to medical studies of a new therapy or intervention. It is useful as well for looking at harmful exposures. But for other types of medical studies, such as diagnostic testing and meta-analysis, this presentation is less helpful. I am developing web pages on these topics at www.childrensmercy.org/stats/diagnostic.asp and www.childrensmercy.org/stats/journal/meta-analysis.asp.
This presentation is also not about how to conduct good research. This presentation is for consumers of research, not producers of research. Even so, when you plan your research you should try to use a research design that is most likely to be persuasive. To that extent, this presentation can help.
This presentation also does not discuss how to analyze research data. There are no formulas in this presentation because I want to focus on how the data were collected, not how they were analyzed.
How did this all get started?
The original inspiration for this book came from the students in an informal class I was teaching at Children's Mercy Hospital in 1997. In a survey, I asked the students why they were taking the class. My hope was that this information would help me select future topics for discussion. A common response was along the lines of "I want to understand the statistics used in medical journal articles." So I prepared a talk called "How to Read a Medical Journal Article." I expanded the talk into a web page (www.childrensmercy.org/stats/journal.asp). I had the good fortune of being invited to write a series of articles about research for the Lab Corner section of the Journal of Andrology. This allowed me to further refine these ideas.
Outline of this Presentation
This overview that you are reading is on the web at www.childrensmercy.org/stats/journal/overview.asp.
This presentation is divided into three major sections. Each section starts of with a publication that highlights some of the issues of interest. The first section: "Apples or Oranges?" examines the quality of the control group. How carefully the control group was selected and handled relates to credibility of the research. If you want a technical term, this is often called the internal validity of the research. The most current version of this section is on the web at www.childrensmercy.org/stats/journal/apples.asp.
The second section: "Who Was Left Out?" examines exclusions before the study started, and exclusions during the study. If important segments of the population are left out, then you may have difficulty generalizing the results of the study. This is often called the external validity of the research. The most current version of this section is on the web at www.childrensmercy.org/stats/journal/leftout.asp.
The third section: "Mountain or Molehill?" examines the clinical relevance of the outcome. The outcome measure has to be properly collected and has to measure something of interest to your patients. The size of the study has to be large enough to produce reasonably precise estimates and the difference between the treatment and control group has to be large enough to have a clinical impact. The most current version of this section is on the web at www.childrensmercy.org/stats/journal/mountain.asp.
I am also working on closely related materials that discuss
- Blinding (www.childrensmercy.org/stats/journal/blinding.asp),
- Confidence intervals (www.childrensmercy.org/stats/journal/confidence.asp),
- Conflicts of interest (www.childrensmercy.org/stats/journal/conflict.asp), and
- Measures of risk (www.childrensmercy.org/stats/journal/oddsratio.asp).
Other Resources (I'm just getting this listed/started. Please be patient.)
There are a lot of good books, web pages, and research papers out there that can help you.
Statistics as Principled Argument. Robert P. Abelson (1995) Hillsdale, New Jersey: Lawrence Erlbaum Associates. ISBN: 0805805281. Description: There is a wealth of wisdom in this book. The basic theme is that Statistics provides basic principles to argue (debate might be a nicer word) about scientific claims. In the first chapter, Dr. Abelson argues that a persuasive argument has to have MAGIC--Magnitude, Articulation, Generality, Interestingness, and Credibility. Then he describes probability and randomness, illustrates common fallacies about probability, and shows how these principles can be applied to research findings. Chapter 5, On Suspecting Fishiness, describes some wonderful examples of strange numbers that might indicate fraud. This chapter is especially valuable because it is so rarely covered. The remaining chapters describe the MAGIC components of a persuasive argument with frequent citations of real research. This book is more conceptual than computational, which fits in with one of Abelson's Laws "Don't talk Greek if you don't know the English translation."
Damned Lies and Statistics Untangling Numbers from the Media, Politicians, and Activists. Joel Best (2001) Berkeley, California: University of California Press. ISBN: 0520219783. Description: Joel Best captures your attention right from the start by describing the worst statistic ever published: a claim that "every year since 1950, the number of American children gunned down has doubled." If you look at what a yearly doubling over one or two decades implies, you will quickly see how inaccurate this claim has to be. Joel Best goes beyond this example, though, to show how there is a social need to wield statistics as "weapons in political struggles over social problems and social policy." Furthermore, these statistics, even when they start as mere guesses tend to be repeated by different media sources and gain credibility with every repetition. When these statistics relate to controversial social policies, they are often defended not by any objective standard but "by challenging the motives of anyone who disputes the figure." Joel Best cites numerous statistics: the suicide rate, the poverty level, the number of homeless people, the illiteracy rate, and shows a remarkable level of even handedness is describing how different political groups use and abuse these numbers. When you see a statistic, Joel Best suggests that you ask three questions: Who created this statistic? Why was this statistic created? and How was this statistic created? You should critical rather than naive or cynical: "The issue is whether a particular statistic's flaws are severe enough to damage its usefulness."
Evidence-Based Medicine: How to Practice and Teach EBM. David L. Sackett, MD, Scott W. Richardson, William Rosenberg, Brian R. Haynes (1998) Edinburgh: Churchill Livingstone. ISBN: 0443056862. Description: There are many books on evidence-based Medicine (EBM), but this is the classic text, and it is hard to beat. It is succinct and to the point, and if that weren't enough, the authors summarize the most important points on plastic index cards that you can carry in your pocket. The authors provide a clear and understandable definition of EBM and make a compelling case for the need to use EBM in your practice. EBM starts with asking the right question. The right question, the authors tell us, should have four components: (1) the patient or problem, (2) the intervention, (3) the comparison, and (4) the outcome. The authors then describe how to search for the best evidence and explain some of the technical details of Medline, a database of medical publications from thousands of journals. The authors mention other resources like the ACP Journal Club on Disk and the Cochrane Database of Systematic Reviews. They then provide practical guidance on how to evaluate studies in six major areas: diagnosis, prognosis, therapy, harm, economic analysis, and quality of care. Everywhere you turn, the authors are addressing real problems and providing pragmatic advice. If you are already an expert on EBM, this book is still valuable, because it provides helpful advice on how to teach these methods. (There is a second edition of this book, published in February 2000 that I have not seen yet.)
Evaluating Research Articles from Start to Finish. Ellen R. Girden (2001) Thousand Oaks, CA: Sage Publications. ISBN: 0761922148. Description: This book offers pragmatic advice for people who read research articles and covers a tremendous range of studies, both qualitative and quantitative. Dr. Girden uses examples of case studies, narrative analysis, surveys, correlation studies, regression analysis studies, factor-analytic studies, discriminant analysis studies, two-condition experimental studies, single classification studies, factorial studies, and quasi-experimental studies. For each type of study, Dr. Girden offers some background on the methodology, provides a checklist of caution factors, and then critically reviews two research publications. The actual text of the research studies is included in the book itself.
Critical Appraisal of Epidemiological Studies and Clinical Trails. Mark J. Elwood (1998) Oxford: Oxford University Press. Description: Dr. Elwood describes intervention trials (both randomized and non-randomized), retrospective and prospective cohort studies, case-control studies, and cross-sectional studies. Dr. Elwood then discusses how to select subjects for the various research designs, how to identify sources of bias and error, how to avoid confounding or control for its effects, and how to assess statistical and practical significance. He then outlines nineteen questions you should ask relating to a description of the evidence, internal validity issues, external validity issues, and comparison of the results to other evidence. Then Dr. Elwood provides a critical review of a variety of published research, including excerpts from the research itself.
Interpreting the Medical Literature Third Edition. Stephen H. Gehlbach (1993) New York: McGraw-Hill. ISBN: 0071054510. Description: Dr. Gehlbach discusses case-control designs, cross-sectional designs, follow-up (cohort) studies, and experimental designs. Then he discusses measurement issues (reliability, validity, systematic error, and measurement error) and explains the basics of hypothesis testing and confidence intervals. He also defines terms used in diagnostic testing (sensitivity, specificity, and predictive value) and measures of risk (relative risk, odds ratios, and attributable risk). Then he discusses how to determine cause and effect (strength of the association, dose-response relationship, biological plausibility, and consistency of the observed evidence). Dr. Gehlbach lumps case series, editorials, and reviews together in a chapter as examples of less rigorous research, but he does not mention meta-analysis. (There is a fourth edition, published in May 2002, which I have not seen.)
Studying a Study and Testing a Test: How to Read the Health Science Literature Third Edition. Richard K. Riegelman, Robert P. Hirsch (1996) Boston, MA: Little, Brown and Company. ISBN: 0316745219. Description: The authors define case-control studies, cohort studies, randomized clinical trials, and meta-analysis and offers a series of questions to help you assess the possible flaws of each type of research. He then discusses how to evaluate diagnostic methods (testing a test) and measures of the frequency of disease (rating a rate). He finally reviews the statistical methods that you can choose for a research study (selecting a statistics). Almost all of the examples in this book are hypothetical. (There is a fourth edition, published in January 2000, which I have not seen.)
Statistical Reasoning in Medicine. The Intuitive P-Value Primer. Lemuel A. Moye (2000) New York: Springer-Verlag. ISBN: 0387989331. Description: This book provides an intuitive and conceptual understanding of p-values. Dr. Moye has chapters on observational data, effect sizes, power, one-tailed tests, multiple endpoints, Bayesian p-values, and subgroup analysis. He includes fascinating examples about the use and abuse of statistics in medicine and each example is backed up with the appropriate journal reference.
Users' Guides to Evidence-Based Practice. Center for Health Evidence. Accessed on 2003-09-02. "The following is the complete set of Users' Guides originally published as a series in the Journal of the American Medical Association (JAMA). The CHE continues to maintain the full text pre-publication version of this series on behalf of the Evidence-Based Medicine Working Group with permission from the journal. See the Disclaimer and Copyright for more information." www.cche.net/usersguides/main.asp
This webpage was written by Steve Simon on (unknown date), edited by Steve Simon and Linda Foland, and was last modified on 2008-07-08. This page needs minor revisions. Category: Statistical evidence