![]() |
![]() |
![]() |
|
![]() |
|
![]() |
|
Stats #74: Use of diagnostic tests for making clinical decisions
Content: This training class will discuss the evaluation of diagnostic tests.
Teaching strategies: Didactic lectures and small group exercises.
Abstract: Not all diagnostic tests are created equal. Some are so bad that they cause more harm than good. After reviewing the general formulas for sensitivity and specificity, I will outline the five phases of research for development of a diagnostic test proposed by Margaret Pepe. I will then explain why research in the early phases provides an insufficient evidence base for making clinical decisions about the utility of a diagnostic test. Finally, I will illustrate how to apply a diagnostic test in a practical setting that incorporate clinical judgment and accounts for individual patient variation. In this talk, you will learn how to: describe the limitations of diagnostic tests, summarize the five phase of diagnostic test development, and apply diagnostic tests in a practical setting.
Objectives: In this seminar, you will learn how to:
- compute classic measures of diagnostic test performance,
- appraise the quality of research on a diagnostic test, and
- apply diagnostic test results to an individual patient.
Notes: There are no pre-requisites for this seminar. This class does not qualify for IRB Education Credits (IRBECs).
I've been asked to remove any personal details from this web site. That included this page. Sorry!
Information about my book, Statistical Evidence in Medical Trials
I
recently published a book, Statistical Evidence in Medical Trials, What
do the Data Really Tell Us? through Oxford University Press. A good
summary of what this book is about appears on the back cover:"Statistical Evidence in Medical Trials is a lucid, well-written and entertaining text that addresses common pitfalls in evaluating medical research. Including extensive use of publications from the medical literature and a non-technical account of how to appraise the quality of evidence presented in these publications, this book is ideal for health care professionals, students in medical or nursing schools, researchers and students in statistics, and anyone needing to assess the evidence published in medical journals." A review by Rebecca Rooney in the International Journal of Epidemiology states: "This book is a clear, concise, and interesting read and should prove to be a useful guide. The examples and case studies make it easy to understand difficult concepts and the jokes and stories make it fun. There are some salient points and hopefully the reader will be enthused about looking at the published research and be more confident about distinguishing between the good and the bad." More information about the book (supporting materials, answers to the exercises, and other updates) can be found on the web at http://www.childrensmercy.org/stats/evidence.asp. |
Where can you find this handout?
This handout and the handouts that I use for all of my seminars and training classes are a compilation of individual web pages at www.childrensmercy.org/stats. I use the "Include Page" feature of Microsoft FrontPage to combine these into a single page. You can always find the most recent version of this compilation by going to the web address listed at the bottom of this page. Links for the handouts for other seminars and classes appear at www.childrensmercy.org/stats/training.asp.
Why don't I use PowerPoint?
I stopped using PowerPoint for my presentations in the mid 1990's. This was based on Edward Tufte's advice that presenting information in a paper handout is more effective than presenting the information on a projected screen. I found this to be excellent guidance. I enjoy talking when I don't have to wrestle with a laptop computer. I look at my audience more and interact with them better. I elaborate on this in greater detail at www.childrensmercy.org/stats/weblog2004/powerpoint.asp.
What is a likelihood ratio?
The likelihood ratio incorporates both the sensitivity and specificity of the test and provides a direct estimate of how much a test result will change the odds of having a disease. The likelihood ratio for a positive result (LR+) tells you how much the odds of the disease increase when a test is positive. The likelihood ratio for a negative result (LR-) tells you how much the odds of the disease decrease when a test is negative.
You combine the likelihood ratio with information about
- the prevalence of the disease,
- characteristics of your patient pool, and
- information about this particular patient
to determine the post-test odds of disease.
If you want to quantify the effect of a diagnostic test, you have to first provide information about the patient. You need to specify the pre-test odds: the likelihood that the patient would have a specific disease prior to testing. The pre-test odds are usually related to the prevalence of the disease, though you might adjust it upwards or downwards depending on characteristics of your overall patient pool or of the individual patient.
You are probably more comfortable specifying a probability instead of an odds, and if so there are simple formulas for converting probabilities into odds. You also may have some uncertainty about the pre-test odds. In this case, you might propose a range of values that seem plausible.
You can summarize information about the diagnostic test itself using a measure called the likelihood ratio. The likelihood ratio combines information about the sensitivity and specificity. It tells you how much a positive or negative result changes the likelihood that a patient would have the disease.
The likelihood ratio of a positive test result (LR+) is sensitivity divided by 1- specificity.
The likelihood ratio of a negative test result (LR-) is 1- sensitivity divided by specificity.
Once you have specified the pre-test odds, you multiply them by the likelihood ratio. This gives you the post-test odds.
The post-test odds represent the chances that your patient has a disease. It incorporates information about the disease prevalence, the patient pool, and specific patient risk factors (pre-test odds) and information about the diagnostic test itself (the likelihood ratio).
Example
An early test for developmental dysplasia of the hip. The test has 92% sensitivity and 86% specificity in boys (AJPH 1998; 88(2): 285-288). The likelihood ratio for a positive result from this test is 0.92 / (1-0.86) = 6.6 for boys. The likelihood ratio for a negative result from this test is (1-0.92) / 0.86 = 0.09 (or roughly 1/11).
Suppose one of our patients is a boy with no special risk factors. The diagnostic test is positive. What can we say about the chances that this boy will develop hip dysplasia? The prevalence of this condition is 1.5% in boys. This corresponds to an odds of one to 66. Multiply the odds by the likelihood ratio, you get 6.6 to 66 or roughly 1 to 10. The post test odds of having the disease is 1 to 10 which corresponds to a probability of 9%.
Suppose we had a negative result, but it was with a boy who had a family history of hip dysplasia. Suppose the family history would change the pre-test probability to 25%. How likely is hip dysplasia, factoring in both the family history and the negative test result? A probability of 25% corresponds to an odds of 1 to 3. The likelihood ratio for a negative result is 0.09 or 1/11. So the post-test odds would be roughly 1 to 33, which corresponds to a probability of 3%.
Notice that a negative test seems to change things more than a positive test. There are two factors at work here. First, a positive result multiplies the pre-test odds by a factor of only seven whereas a negative result divides the pre-test odds by 11. This means that the test is better at ruling out a condition than ruling it in.
Second, the impact of a test is usually greatest for mid-sized probabilities. If a condition is either very rare, or very common, then only a very definitive test is likely to change things much. But mid-sized probabilities (say between 20% and 80%) will change greatly on the basis of even a moderately precise test.
Summary
The likelihood ratio, which combines information from sensitivity and specificity, gives an indication of how much the odds of disease change based on a positive or a negative result. You need to know the pre-test odds, which incorporates information about prevalence of the disease, characteristics of your patient pool, and specific information about this patient. You then multiply the pre-test odds by the likelihood ratio to get the post-test odds.
This webpage was written by Steve Simon on 2005-08-18, edited by Steve Simon, and was last modified on 2008-07-14. This page needs minor revisions. Category: Definitions, Category: Diagnostic testing.
Likelihood ratio slide rule (October 24, 2002) Category: Diagnostic testing
The use of likelihood ratios requires a bit of tedious calculations. I have developed a simple slide rule that will do likelihood ratio calculations for you.
Note: I am developing a special handout (PDF format) that explains the mathematics behind diagnostic testing and which illustrates many of the important points using the likelihood ratio slide rule. I distributed this handout in a talk for the American College of Allergy, Asthma & Immunology on Sunday, November 11, but ran out very quickly.
Assembly instructions
Please print out this graphic image of the likelihood ratio slide rule (PDF format). An earlier version of this slide rule is also available.
Cut out the bottom piece (the sleeve) and the top piece (the insert). Also cut out the two rectangles in the middle of the sleeve. Fold the left and right portions of the sleeve behind and tape them together. Double sided tape works very well for this. Slip the insert into the sleeve. You may need to trim a tiny amount off the left and right sides of the insert to get it to fit well. You want the insert to fit not too snugly and not too loosely inside the sleeve.
For a more durable slide rule
If you print this to a regular sheet of paper, the slide rule will be okay but a bit flimsy and easy to bend. For a more durable slide rule, print out the image on a thick piece of paper or tape/glue the image to a thin piece of cardboard. You can also print the image on a full sheet adhesive label (like Avery 5165) and then attach the label to a thick piece of paper or a thin piece of cardboard.
How to use the slide rule
Slide the insert up or down until the pre-test probability in the left window lines up with the likelihood ratio. Read the post-test probability in the right window.
Examples
In Watkins et al 2001, a single question diagnostic test (the Yale-Brown obsessive-compulsive scale) was compared to a "gold standard" measure of depression, the Montgomery Asberg depression rating scale (MADRS).
On the MADRS 43 (54%) were classified as clinically depressed; 37 answered "yes" to the Yale single question and six answered "no." Of the 36 classified as not depressed, eight answered "yes" and 28 "no." The values (95% confidence intervals) for the Yale test were sensitivity 86% (75% to 97%), specificity 78% (65% to 91%), positive predictive value 82% (71% to 93%), negative predictive value 82% (69% to 95%); 82% (73% to 91%) of cases were classified correctly.
The prevalence of depression in this population was unusually high, so the authors presented additional positive predictive values (PPV) and negative predictive values (NPV) for prevalence values ranging from 10% to 90%. An abridged version of their table appears below.
Prevalence PPV NPV 90% 97% 38% 80% 94% 58% 70% 90% 70% 60% 85% 79% 50% 80% 85% 40% 72% 89% 30% 63% 93% 20% 49% 96% 10% 30% 98% Since the PPV is simply the post-test probability after a positive test, we can use the likelihood ratio slide rule to re-create their calculations. First, we need to compute the likelihood ratio for a positive test (LR+). The formula is
LR+ = Sn / (1-Sp) = 0.86 / (1-0.78) = 3.9
where Sn and Sp are sensitivity and specificity, respectively. We will round this value to 4.
To compute the positive predictive value when the prevalence of the disease is 10%, line up the 10% pre-test probability with the likelihood ratio of 4 (the unlabelled tick mark between 3 and 5). In the right side window, the post-test probability should be slightly more than 30%, which matches the value computed by Watkins.
Slide the insert up so the 20% pre-test probability lines up with the likelihood ratio of 4. The post-test probability should be around 50% which also matches the value in Watkins.
Now slide the insert up so the 30% pre-test probability lines up with the likelihood ratio of 4. The post-test probability should be slightly more than 60%.
Repeat this for 40%, through 90% and see if you can estimate the remaining PPV values.
To compute NPV, we need to calculate the likelihood ratio for a negative test (LR-). The formula is
LR- = (1-Sp) / Sn = (1-0.86) / 0.78 = 0.18.
There is no tick mark for 0.18, so we will use a point about halfway between the 0.15 and 0.2 tick marks. Line up the prevalence of 10% with the likelihood ratio of 0.18 and read off the post-test probability of 2% in the right side window. Since there is only a 2% chance of having the disease, there is a 98% of being healthy, which matches the NPV computed by Watkins.
Line up a prevalence of 20% with the likelihood ratio of 0.18 to get a post-test probability of 4% and an NPV of 96%.
Now line up a prevalence of 30% with the likelihood ratio of 0.18 to get a post-test probability of 7% and an NPV of 93%.
Repeat this for 40% through 90% and estimate the remaining NPV values.
Second example
A letter to the editor in BMJ commented on how the use of likelihood ratios could have simplified the interpretation of results of a rapid whole blood test for diagnosing Helicobacter pylori infection.
In that study the likelihood ratio for a positive test result was 9.8. The advantage of knowing this is that it can be applied to similar patients in other populations to estimate the predictive value of the test, provided that the pre-test probability of disease can be estimated. For example, H pylori is found in 48% of dyspeptic patients in the community (the pre-test probability), so therefore a positive rapid blood test with a likelihood ratio of 9.8 applied to this population would give a post-test probability (or predictive value) of 90% (this can be estimated using a simple calculation or a nomogram). --BMJ 1997; 314: 1688.
We have to round a bit here. Line up a pre-test probability of 50% with a likelihood ratio of 10. Read the post-test probability of slightly more than 90% in the upper window.
Third example
Buschbaum et al examined the sensitivity, specificity, and likelihood ratio for the CAGE score, a series of yes/no answers to four questions (Ann Intern Med 1991; 115(10): 774-777). The four item scale was very good at detecting alcohol abuse or dependence.
Score Abuse or
DependenceNo abuse or
dependenceLikelihood
ratio0 33 428 0.14 1 45 54 1.5 2 86 34 4.5 3 74 10 13 4 56 1 100 In this paper, the authors noted a prevalence of alcohol abuse and dependence of 36%. Find this value in the pre-test probability and line it up successively with each of the likelihood ratios listed above. You should get a post-test probability of 7%, 45%, 70%, 90% and 98% for the scores of 0 through 4, which matches up nicely with the values given in the paper. The likelihood ratio slide rule computations are shown below for the first three of these cases.
Grant et al tabulated the prevalence of alcohol abuse or dependence for demographic groups. This rate varies by age (higher among younger people), by gender (higher among males) and race (higher among non-blacks). Among non-black males, for example, the prevalence is 23%, 11%, 6%, and 1% for 18-29, 30-44, 45-64, and 65+ years of age, respectively (Alcohol Health & Research World 1994; 18(3):243-248, as quoted in alcoholism.about.com/library/nabdep4.htm).
The prevalence would be roughly twice as high among ambulatory patients than the general population and four times as high for hospitalized patients than the general population (Postgraduate Medicine Online 1996; 100(1), www.postgradmed.com/issues/1996/07_96/blondell.htm).
Suppose you apply the CAGE score to a 70 year old hospitalized white male. This person scores 3 on CAGE. Line up a pre-test probability of 4% with a likelihood ratio of 13. The post test probability is slightly more than 30%.
Suppose you give the same test to a 35 year old white male who visits your clinic and he scores 0 on CAGE. Line up a pre-test probability of 22% with a likelihood ratio of 0.14. The post-test probability is 4%.
How does it work?
The likelihood ratio slide rule works on the same principle as a regular slide rule. The logarithms on a slide rule allow you to multiply simply by adding. It uses the simple formula
log (a*b) = log (a) + log (b).
There's an old joke well known among mathematicians about logarithms. After the flood waters receded, Noah commanded the animals to go forth and multiply. The snakes went up to Noah and told him they couldn't multiply because they were adders. So Noah built them a piece of wooden furniture with a flat top and four legs. The adders could now multiply because they had a log table.
The formula for computing post-test odds is
post-test odds = likelihood ratio * pre-test odds.
By taking logarithms of both sides of the equation, we get
log (post-test odds) = log (likelihood ratio) + log (pre-test odds)
Sliding the insert up or down will add a pre-test log odds value to a log likelihood ratio to get a post-test log odds value. The tick marks are labeled using probability rather than odds to simplify things further.
The likelihood ratio slide rule that I developed was inspired by the Fagan nomogram which also uses logarithms. In the Fagan nomogram, you draw a line connecting the pre-test probability with the likelihood ratio. Extend the line further to the right to compute the post-test probability.
Summary
The likelihood ratio slide rule allows you to compute the post-test probability of a disease given the pre-test probability and the likelihood ratio of a diagnostic test. Simply line up the pre-test probability in the left side window with the likelihood ratio. Then read the post-test probability in the right side window.
This webpage was written by Steve Simon and was last modified on 07/08/2008.
Recommendations from Sackett et al for evaluating a diagnostic test (July 2, 2007). Category: Diagnostic testing
There is a lot of controversy about diagnostic testing, and I have mentioned some of these controversies in other weblog entries. I wanted to review what the experts say about diagnostic testing. The definitive resource for evaluating any medical controversy is
- Evidence-based Medicine How to Practice and Teach EBM. David L. Sackett, Scott W. Richardson, William Rosenberg, Brian R. Haynes (1998) Edinburgh: Churchill Livingstone. [BookFinder4U link]
There's a newer edition, published in 2005, but I don't think the material I am quoting has changed all that much. The material in Sackett et al was published earlier as
- Users' guides to the medical literature. III. How to use an article about a diagnostic test. A. Are the results of the study valid? Evidence-Based Medicine Working Group. R. Jaeschke, G. Guyatt, D. L. Sackett. Jama 1994: 271(5); 389-91. [Medline]
- Users' guides to the medical literature. III. How to use an article about a diagnostic test. B. What are the results and will they help me in caring for my patients? Evidence-Based Medicine Working Group. R. Jaeschke, G. Guyatt, D. L. Sackett. Jama 1994: 271(5); 389-91. [Medline]
and is available on the web at
The guidance is still quite relevant today.
Suppose you are reviewing a research paper that touts a new diagnostic test. Before you decide whether to use this diagnostic test, you have to assess whether the research findings are valid. You need to ask yourself three questions:
- Was there an independent, blind comparison with a reference standard?
- Did the patient sample include an appropriate spectrum of patients to whom the diagnostic test will be applied in clinical practice?
- Did the results of the test being evaluated influence the decision to perform the reference standard?
If the research findings are valid, then you have to assess whether the diagnostic test is clinically significant.
If the diagnostic test is valid and clinically significant, you have to assess whether you can can you extrapolate the results of the study to the particular patient who is in your office right now. You need to ask whether the results in the particular study are applicable to the patients that I normally see.
Finally, you need to know if you have enough information to apply the results in your particular setting. You need to ask yourself three more questions.
- Is the diagnostic test available, affordable, accurate, and precise in your setting?
- Can you generate a clinically sensible estimate of your patient's pre-test probability?
- Will the resulting post-test probabilities affect your management and help your patient?
Let's consider this advice in detail.
Was there an independent, blind comparison? Any research study evaluating a diagnostic test is going to compare it to a more expensive or invasive test that produces a definitive diagnosis of disease. The test that provides a definitive diagnosis is referred to as the "gold standard." Blinding is important in any research study, but it is especially important when there is subjectivity in the interpretation of results. Most diagnostic tests require some level of judgment and if the person applying the diagnostic test is aware of the results of the gold standard or vice versa, that can influence the results. Usually lack of blinding will produce overly optimistic results for the diagnostic test. If the diagnostic test and the gold standard are produced by an automated system with little or no operator intervention and with little or no ambiguity in the reading of results, then blinding is less critical.
Did the study have an appropriate spectrum of patients. Some research designs will include only patients with obvious and overt manifestations of disease. By excluding the milder cases (the shades of gray), the resulting black versus white comparison will result produce overly optimistic results for the diagnostic test. An appropriate spectrum of patients is also important in insuring that the research results can be extrapolated to your patients (see below).
Did the diagnostic test results influence the decision to perform the reference standard? The gold standard is by definition more expensive or more invasive, so there is a natural reluctance to apply the reference standard. The ideal research study would require every patient to endure both the diagnostic test and the gold standard, but sometimes this is difficult. Suppose the gold standard involves surgery. What do you tell the patients who test negative on the diagnostic test (we suspect that everything is okay, but we want you to submit to this surgery to preserve the credibility of our research findings).
Are the results for the diagnostic test clinically significant? A diagnostic test is clinically significant if knowledge of the results of the diagnostic test can substantially alter your belief about whether your patient has a particular disease. The likelihood ratio will help you answer this question. A likelihood ratio for a positive result smaller than 2 or a likelihood ratio for a negative result larger than 0.5 is pretty much worthless.
Can you extrapolate the results? Medical research is often conducted in an idealized setting that makes the research easier to run but which makes it difficult to generalize the results to your particular patients. Look at the inclusion and exclusion criteria in the study and see if the research population is drawn more narrowly than your patients. Also examine the table of demographics to see if they are comparable to the demographics of your patients (e.g., comparable ages and comparable mixes of race, ethnicity, and gender).
Is the diagnostic test available, affordable, accurate, and precise in your setting? Does the diagnostic test require special skills in its application? Does it require equipment that you do not have? Does the mix of patients that you see raise special issues? For example, do your patients experience developmental problems that make communication difficult?
Can you generate a clinically sensible estimate of your patient's pre-test probability? To apply a diagnostic test, you first need an estimate of the pre-test probability. Do you have records in your practice regarding how often patients who come to you complaining of a particular problem actually have the disease that you are testing for? Are there regional or national surveys that estimate prevalence of the disease? You'd have to adjust this estimate, of course, because the patients who come to see you are more likely to have the disease than the typical probability you'd get by an "on the street" survey. If your patients are similar to the research studies, then the prevalence of disease in that study might be a reasonable estimate. If your patients are dissimilar, but in a way that leads to a predictable increase or decrease in the pre-test probability, make the appropriate adjustment. If you have personal experience through many years of practice, you might be able to provide a "seat of the pants" estimate. Just be sure that your estimate is not colored by your most recent case or your most embarrassing case.
Will the resulting post-test probabilities affect your management and help your patient? A diagnostic test is useless if the likelihood ratio does not shift the probability by a sufficient amount to cause you to cross a treatment threshold. You don't have to do a formal likelihood ratio calculation for every patient that you see, however. Just run a few examples that are typical for a reasonable range of patients (e.g., calculate the results using pre-test probabilities from 45 year old, 65 year old, and 85 year old patients, both smokers and non-smokers).
This webpage was written by Steve Simon and was last modified on 07/08/2008.