Statistical Evidence: Overview

This is a first draft of the overview for "Statistical Evidence."

"Still, it is an error to argue in front of your data. You find yourself insensibly twisting them around to fit your theories." Sherlock Holmes in The Adventure of Wisteria Lodge.

Reading medical research is hard work. I'm not talking about the medical terminology, though that is often quite bad (if I hear the word "emesis" one more time, I'm going to throw up!). The hard part is assessing the strength of the evidence. When you read a journal article, you have to decide if the authors present a case that is persuasive enough to get you to change your practice. This means assessing the strength of the evidence.

Some evidence is so strong that it stands on its own. Other evidence is weaker and requires support from other studies, from mechanistic arguments, and so forth. Still other evidence is so weak, that you should not consider any changes in your practice until the study is replicated using a more rigorous approach. I hope to elaborate on the criteria that you should use when assessing the strength of the evidence.

0.1. What should you look for?

When you are assessing the quality of the evidence, it's not how the data are analyzed that's important. Far more important is HOW THE DATA ARE COLLECTED. Don't agonize over technical details about the statistical analysis. After all, if you collect the wrong data, it doesn't matter how fancy the analysis is.

This is good news, because you don't need a lot of statistical training or a lot of mathematical sophistication to assess how the data are collected.

In this book, I want to show you what to look for and why. I will also highlight real research articles and use them as examples. Although all of the examples represent good and valuable research, some of the examples represent a level of evidence that by itself is less persuasive. It is helpful to understand why these examples are less persuasive.

0.2. Schizophrenic Research

Unfortunately, there is a lot of less than persuasive research out there. You don't have to look very hard to find solid empirical evidence of this. One of my favorite example is a study by Ben Thornley and Clive Adams that appeared in the British Medical Journal in 1998. You can find the full text of this article on the web at bmj.com/cgi/content/full/317/7167/1181 and it is well worth reading. Thornley and Adams looked at the quality of clinical trials for treating schizophrenia. Since they work for the Cochrane Collaboration Group, a group that provides systematic reviews of the results of medical trials, they are in a good position to write such an article.

Thornley and Adams actually identified over 2500 studies of schizophrenia, but decided to summarize only the first 2000 that they uncovered. Perhaps they reached the point of sheer exhaustion. I am very impressed at the amount of work this must have taken.

The research covered fifty years, starting in 1948 through 1997. The research covered a variety of therapies: drug therapies, psychotherapy, policy or care packages, or physical interventions like electroconvulsive therapy.

What did Thornley and Adams find? It wasn't a pretty picture. First, researchers in schizophrenia studied the wrong patients. Most studies used institutionalized patients, who are easier to recruit and follow up with, but who do not provide a good representation of the all patients with schizophrenia. Readers would probably be interested as much in community based studies, if not more interested, but only 14% of the studies were community based.

Second, the researchers also did not study enough patients. Thornley and Adams estimated that a good study of schizophrenia should have at least 300 patients in each group. This would be based on rates of improvements that might be expected for an active drug compared to placebo effects. Even though the desired sample size was 300, it turns out that the average study had only 65. Only 3% of the studies had 300 or more patients.

Third, the researchers did not study the patients long enough. A good study of schizophrenia should last for six months or more; long term changes are more important than short term changes. Unfortunately, more than half of the studies lasted for six weeks or less.

Finally, the researchers did not measure these patients consistently. In the 2,000 studies, the researchers used 640 ways to measure the impact of the interventions. Granted, there are a lot of dimensions to the schizophrenia and there were measures of symptoms, behavior, cognitive functioning, side effects, social functioning, and so forth. Still, there is no justification for using so many different measurements. Imagine how hard this makes it for anyone to summarize the results of this research. Failure to use and re-use a few standardized assessments has led to a very fragmentary (dare I say, schizophrenic) picture about schizophrenia treatments.

I don't wish to single out research in just this area. There are many reviews in other areas that also point out the flaws and shortcomings of research. Also keep in mind that research on schizophrenia is especially hard to do well. The take home message from Thornley and Adams is that just because the research is peer-reviewed does not mean that it is perfect. I hope to help you identify factors that limit the quality of peer-reviewed research.

0.3. Healthy Skepticism

Please don't panic. Research studies have many flaws but usually those flaws do not make the research wholly uninterruptible. These limitations should make you skeptical, perhaps, but not cynical.

The cynical attitude would be "you can prove anything with statistics" and leads to a nihilistic view that all research is garbage. The cynical attitude would lead you to nit pick a research paper, find a flaw here and a flaw there. Then use these flaws to disregard any research whose conclusions make you uncomfortable.

A skeptical attitude, on the other hand, would ask "how persuasive is this research" and would look at the strengths and the weaknesses of a research paper. It would place limits on how persuasive the research is. When the research was not sufficiently persuasive, a skeptical attitude would encourage you to think about what level of evidence would be enough to persuade you otherwise.

This webpage was written by Steve Simon on (unknown date), edited by Steve Simon and Linda Foland, and was last modified on 2008-07-08. This page needs minor revisions. Category: Statistical evidence