Statistical Evidence. Material from/for various sections of the book
Some topics that I might want to discuss in future web pages and/or in a second edition of the book.
What sample size is needed to allow randomization to prevent covariate imbalance with high probability?
Describe patient preference trials: http://bmj.bmjjournals.com/cgi/content/full/316/7128/360
Review Observational research methods. Research design II: cohort, cross sectional, and case-control studies. Mann CJ. Emerg Med J 2003: 20(1); 54-60. [Medline] [Abstract] [Full text] [PDF]
Attrition and bias in the MRC cognitive function and ageing study: an epidemiological investigation. Matthews FE, Chatfield M, Freeman C, McCracken C, Brayne C. BMC Public Health 2004: 4(1); 12. [Medline] [Abstract] [Full text] [PDF]
The Oslo Health Study: The impact of self-selection in a large, population-based survey. Sogaard AJ, Selmer R, Bjertness E, Thelle D. Int J Equity Health 2004: 3(1); 3. [Medline] [Abstract] [Full text] [PDF]
The requirement for prior consent to participate on survey response rates: a population-based survey in Grampian. Angus VC, Entwistle VA, Emslie MJ, Walker KA, Andrew JE. BMC Health Serv Res 2003: 3(1); 21. [Medline] [Abstract] [Full text] [PDF]
The fallacy of enrolling only high-risk subjects in cancer prevention trials: is there a "free lunch"? Baker SG, Kramer BS, Corle D. BMC Med Res Methodol 2004: 4(1); 24. [Medline] [Abstract] [Full text] [PDF]
Discuss particularization. Finasteride in the treatment of clinical benign prostatic hyperplasia: a systematic review of randomised trials. Edwards JE, Moore RA. BMC Urol 2002: 2(1); 14. [Medline] [Abstract] [Full text] [PDF]
Discuss Jurendini 2004 reference
http://www.clinicalmolecularallergy.com/content/3/1/2
http://arthritis-research.com/content/7/2/R333
http://www.biomedcentral.com/1471-2318/5/2/abstract
http://ccforum.com/content/9/2/R83/abstract
http://www.biomedcentral.com/1472-6963/5/4/abstract
http://www.biomedcentral.com/1471-2261/5/2/abstract
http://ccforum.com/content/9/2/R74/abstract
http://www.reproductive-health-journal.com/content/2/1/1/abstract
http://ccforum.com/content/9/2/R60
http://breast-cancer-research.com/content/7/2/R184/abstract
http://www.translational-medicine.com/content/2/1/46/abstract
http://www.biomedcentral.com/1471-2431/4/24/abstract
http://www.rbej.com/content/2/1/82/abstract
http://www.biomedcentral.com/1471-230X/4/32/abstract
http://bmj.bmjjournals.com/cgi/content/full/318/7194/1324
http://www.cmaj.ca/cgi/content/full/168/7/835
http://bmj.bmjjournals.com/cgi/content/full/327/7424/1159
http://bmj.bmjjournals.com/cgi/content/full/317/7155/405
http://bmj.bmjjournals.com/cgi/content/full/325/7358/269
How well is the clinical importance of study results reported? An assessment of randomized controlled trials. Chan KB, Man-Son-Hing M, Molnar FJ, Laupacis A. Cmaj 2001: 165(9); 1197-202. [Abstract] [Full text] [PDF] BACKGROUND: The interpretation of the results of randomized controlled trials (RCTs) has traditionally emphasized statistical significance rather than clinical importance. Our aim was to assess the quality of reporting of factors related to clinical importance in a sample of published RCTs. METHODS: A random sample of 27 (of a total of 266) RCTs published in 5 major medical journals over a 1-year period were reviewed by 4 independent reviewers for factors considered important in the interpretation of the clinical importance of study results: identification of a clearly defined primary outcome, reporting of the expected difference between groups used in the calculation of sample size (the delta value) and whether it was based on the minimal clinically important difference of the intervention, the statistical significance of the results, presentation of pertinent confidence intervals, and the authors' interpretation of the clinical importance of the results. RESULTS: Twenty-two of 27 (81%) articles explicitly reported a single primary outcome. Of the 20 articles that included a sample size calculation, 18 (90%) reported a delta value. Two of the 18 (11%) articles explicitly stated that the delta value was chosen to reflect the minimal clinically important difference of the intervention. For the primary outcomes, confidence intervals surrounding the point estimates of the efficacy of the interventions were reported in 11 of 27 (41%) studies. The study results were interpreted from the perspective of clinical importance in 20 of 27 (74%) of the articles. Of these 20 reports, 5 (25%) provided justification for their clinical interpretation of the results. INTERPRETATION: Authors of RCTs published in major general medical and internal medicine journals do not consistently provide their own interpretation of the clinical importance of their results, and they often do not provide sufficient information to allow readers to make their own interpretation.
Audit and feedback: effects on professional practice and health care outcomes. Thomson OB, Oxman AD, Davis DA, Haynes RB, Freemantle N, Harvey EL. Cochrane 2000: (2); CD000259. [Medline] BACKGROUND: Audit and feedback has been identified as having the potential to change the practice of health care professionals. OBJECTIVES: To assess the effects of audit and feedback on the practice of health professionals and patient outcomes. SEARCH STRATEGY: We searched MEDLINE up to June 1997, the Research and Development Resource Base in Continuing Medical Education, and reference lists of related systematic reviews and articles. SELECTION CRITERIA: Randomised trials of audit and feedback (defined as any summary of clinical performance of health care over a specified period of time). The participants were health care professionals responsible for patient care. DATA COLLECTION AND ANALYSIS: Two reviewers independently extracted data and assessed study quality. MAIN RESULTS: Thirty-seven studies were included, involving more than 4977 physicians. The reporting of study methods was inadequate for almost all studies. In 31 out of 37 studies the randomisation process could not be determined. Information regarding data analysis was also lacking. For example, power calculations were not mentioned in 27 out of 37 studies. A variety of behaviours were targeted including the reduction of diagnostic test ordering, prescribing practices, preventive care, and the general management of a problem, for example hypertension. Twenty-eight studies measured physician performance, one study targeted patient outcomes in diabetes and the remaining eight studies measured both physician performance and patient outcomes. The relative percentage differences ranged from -16% to 152%. The clinical importance of the changes was not always clear. REVIEWER'S CONCLUSIONS: Audit and feedback can sometimes be effective in improving the practice of health care professionals, in particular prescribing and diagnostic test ordering. When it is effective, the effects appear to be small to moderate but potentially worthwhile. Those attempting to enhance professional behaviour should not rely solely on this approach.
Interpreting thresholds for a clinically significant change in health status in asthma and COPD. Jones PW. Eur Respir J 2002: 19(3); 398-404. [Medline] Health status (or Health-Related Quality of Life) measurement is an established method for assessing the overall efficacy of treatments for asthma and chronic obstructive pulmonary disease (COPD). Such measurements can indicate the potential clinical significance of a treatment's effect. This paper is concerned with methods of estimating the threshold of clinical significance for three widely used health status questionnaires for asthma and COPD: the Asthma Quality of Life Questionnaire, Chronic Respiratory Questionnaire and St George's Respiratory Questionnaire. It discusses the methodology used to obtain such estimates and shows that the estimates appear to be fairly reliable; ie. for a given questionnaire, similar estimates may be obtained in different studies. These empirically derived thresholds are all mean estimates with confidence intervals around them. The presence of these confidence intervals affects the way in which the thresholds may be used to draw inferences concerning the clinical relevance of clinical trial results. A new system of judging the magnitude of clinically significant results is proposed. Finally, an attempt is made to translate these thresholds into scenarios that illustrate what a clinically significant change with treatment may mean to an individual patient.
Recombinant or urinary follicle-stimulating hormone? A cost-effectiveness analysis derived by particularizing the number needed to treat from a published meta-analysis. Ola B, Papaioannou S, Afnan MA, Hammadieh N, Gimba S. Fertil Steril 2001: 75(6); 1106-10. [Medline] OBJECTIVE: To demonstrate that particularizing pooled results of a meta-analysis can derive incremental cost effectiveness of superovulation with recombinant follicle-stimulating hormones (rFSH) vs. the highly purified urinary form (uFSH) for assisted conception. DESIGN: A retrospective study. SETTING: An assisted conception unit in the United Kingdom. PATIENT(S): One hundred forty-five fresh in vitro fertilization (IVF) and 58 fresh intracytoplasmic sperm injection (ICSI) cycles. INTERVENTION(S): rFSH vs. uFSH. MAIN OUTCOME MEASURE(S): Incremental cost-effectiveness (i.e., cost needed to treat, or CNT) and budget-impact analyses of rFSH vs. uFSH. RESULT(S): In women less than 30 years old, the clinical pregnancy rate was 37.7% (95% CI 24.8%-52.1%), the particularized number needed to treat (pNNT) was -19, and the cost needed to treat was 5070.51 pounds sterling (3660.53 pounds sterling to 7619.32 pounds sterling). For the 30- to 35-year-old age group, the clinical pregnancy rate was 29.9% (95% CI 20.0%--41.4%), the particularized number needed to treat was -24, and CNT was 7335.59 pounds stering (5284.11 pounds sterling to 10,941.22 pounds sterling). For the 36- to 40-year-old age group, the clinical pregnancy rate was 30.6.0% (95% CI 19.6%--43.7%), the particularized number needed to treat was -23.0, and the CNT was 8569.67 pounds sterling (5998.70 pounds sterling to 13,413.24 pounds sterling). CONCLUSION(S): The CNT and thus the budget impact analyses (the extra number of cycles that can be funded by the CNT) both increase directly with age of the patient, and inversely with the clinical pregnancy rate.
Making consent patient centred. Bridson J, Hammond C, Leach A, Chester MR. BMJ 2003: 327(7424); 1159-1161. [Full text] [PDF]
Changes in clinical trials mandated by the advent of meta-analysis. Chalmers TC, Lau J. Stat Med 1996: 15(12); 1263-8; discussion 1269-72. [Medline] Service on the Data Monitoring Committee of the CPEP (Calcium for Pre-eclampsia Prevention) has led us to four conclusions about clinical trials which we should like to present to this gathering of biostatisticians for their reactions: (i) meta-analyses of the pertinent published trials of the same therapy should always be undertaken before the start of a new trial, and the results examined to help determine the design of a new trial or determine if a trial should be undertaken at all; (ii) assuming that a decision is made to go ahead, the results of the past trials should be used in sizing the new one; (iii) in the course of the new one, regardless of the size estimates, stopping early should be considered if the trends conform to the results of the meta-analysis; and (iv) heterogeneity of patients entering clinical trials is desirable and should be specifically studied, and it should never be concluded that an average outcome is applicable to all future patients.
Applying the results of trials and systematic reviews to individual patients. Glasziou P, Guyatt GH, Dans AL, Dans LF, Straus S, Sackett DL. ACP Journal Club 1998: 129(3); A15-6. [Medline] Your patient is a 60-year-old hypertensive, alcoholic woman whose symptomless atrial fibrillation was first documented 3 months ago. An echocardiogram shows an enlarged left atrium, rendering successful cardioversion unlikely. She tells you that both of her parents had severe strokes that made the last years of their lives horrible, and she is terrified of having a stroke. You know that a meta-analysis of 5 randomized trials of warfarin in nonvalvular atrial fibrillation demonstrated a 68% relative risk reduction (RRR) in stroke (1). You consider prescribing warfarin for this patient but know that she would not have qualified for the study because alcoholism increases her risk for major hemorrhage (2).
Can treatment that is helpful on average be harmful to some patients? A study of the conflicting information needs of clinical inquiry and drug regulation. Horwitz RI, Singer BH, Makuch RW, Viscoli CM. J Clin Epidemiol 1996: 49(4); 395-400. [Medline] Randomized controlled trials are conducted with heterogeneous groups of patients, and the trial results represent an estimate of the average difference in the responses of the treatment groups. Clinicians, however, engage in a process of clinical inquiry, assembling data that will allow an assessment of the appropriate choice of treatment according to more narrowly defined clinical features. We describe a method of clinical inquiry within RCTs that can enhance the applicability of results to clinical decision making. Our methods included the use of data from the Beta-Blocker Heart Attack Trial, which enrolled 3837 subjects in 31 clinical centers. The 31 centers were divided into 21 dominant centers (mortality rates higher for placebo than propranolol) and 10 divergent centers (higher mortality rates for patients randomized to propranolol). Overall, compared to placebo, propranolol reduced the risk of dying for the "average" patient from 9.8 to 7.2%. Results for patients in dominant centers (RR = 0.50) were significantly different from those in divergent centers (RR = 1.33). We identified two cotherapies--aspirin use and coronary artery surgery--that subsequently affected the benefits of propranolol in divergent centers. For patients in divergent centers, propranolol reduced the risk of dying for patients treated with aspirin and/or coronary surgery (RR = 0.39), but not for patients not receiving these therapies (RR = 1.42). We conclude that differences in results across centers of a multicenter RCT may reflect important distinctions in the clinical conditions of enrolled subjects. These distinctions help to identify subgroups of patients in which treatment that has an average overall benefit may be harmful for some patients.
Decision analysis and the implementation of research findings. Lilford RJ, Pauker SG, Braunholtz DA, Chard J. British Medical Journal 1998: 317(7155); 405-9. [Medline] [Full text]
Variation in patient utilities for outcomes of the management of chronic stable angina. Implications for clinical practice guidelines. Ischemic Heart Disease Patient Outcomes Research Team. Nease RF, Kneeland T, O'Connor GT, Sumner W, Lumpkins C, Shaw L, Pryor D, Sox HC. Jama 1995: 273(15); 1185-90. [Medline] OBJECTIVE--Although practice guidelines sometimes make recommendations based on symptom severity, they rarely account for how patients feel about their symptoms. To investigate the possible importance of patient preferences in treatment of ischemic heart disease, we assessed attitudes toward symptoms in patients with angina pectoris. DESIGN--Case series. SETTING--Ambulatory cardiology clinics at two tertiary care medical centers. PATIENTS--A total of 220 subjects were selected from 589 patients with chronic stable angina referred from cardiologists to achieve patients samples balanced for sex, race, and angina severity. MAIN OUTCOME MEASURES--We measured patients' attitudes toward their angina using the rating scale, time trade-off, and standard gamble utility metrics. Reliability of measurements was evaluated by repeating the assessments 2 weeks later on 50 willing patients. RESULTS--While the mean responses followed the expected patterns (those with more severe Canadian Cardiovascular Society scores chose lower utilities), attitudes toward symptoms varied substantially among patients with similarly severe angina. For example, there was a 33% chance that a patient with class II angina had a time trade-off utility that was lower (ie, more bothered by symptoms) than a patient with more severe angina (class III/IV). This variation in utilities was not due to random error in the assessments. CONCLUSIONS--Angina patients with similar functional limitation vary considerably in their tolerance for their symptoms, as measured by utilities. Our findings suggest that guidelines for the management of ischemic heart disease should be based on the preferences of the individual patient rather than on symptom severity alone.
Pronouncements about the need for "generalizability" of randomized controlled trial results are humbug. Sackett DL. Control. Clinical Trials 2000: 21; 82S. Abstract not available.
Conflicting clinical trials and the uncertainty of treating mild hypertension. Toth PJ, Horwitz RI. Am J Med 1983: 75(3); 482-8. [Medline] Recommendations to treat patients with mild hypertension are based principally on six randomized clinical trials conducted in three countries between 1964 and 1979. To determine whether the methods and results of these randomized clinical trials justify the current therapeutic policy, a clinical epidemiologic analysis of the data was performed focusing on (1) clinical versus statistical significance, (2) clinical heterogeneity of patients' baseline state, (3) suitable management of the untreated control patients, and (4) choice of outcome events. This analysis suggested that the results of available studies are better suited to public health decisions (number of cardiovascular deaths prevented nationwide) than personal health decisions (whether treatment does more good than harm for individual patients), and that current evidence does not justify a uniform policy of treating all asymptomatic patients with mild hypertension.
The visual analog scale for pain: clinical significance in postoperative patients. Bodian CA, Freedman G, Hossain S, Eisenkraft JB, Beilin Y. Anesthesiology 2001: 95(6); 1356-61. [Medline] BACKGROUND: The visual analog scale is widely used in research studies, but its connection with clinical experience outside the research setting and the best way to administer the VAS forms are not well established. This study defines changes in dosing of intravenous patient-controlled analgesia as a clinically relevant outcome and compares it with VAS measures of postoperative pain. METHODS: Visual analog scale measurements were obtained from 150 patients on the morning after intraabdominal surgery. On the same afternoon, 50 of the patients provided a VAS score on the same form used in the morning, 50 on a new form, and 50 were not asked for a second VAS measurement. RESULTS: Visual analog scale values and changes in value were similar for patients who were given a new VAS form in the afternoon and those who used the form that showed the morning value. The proportions of patients requesting additional analgesia were 4, 43, and 80%, corresponding to afternoon VAS scores of 30 or less, 31-70, and greater than 70, respectively. Change from morning VAS score had no apparent influence on patient-controlled analgesic dosing for patients with afternoon values of 30 or less or greater than 70, but changes in VAS scores of at least 10 did discriminate among patients whose afternoon values were between 31 and 70. CONCLUSIONS: When pain is an outcome measure in research studies, grouping final VAS scores into a small number of categories provides greater clinical relevance for comparisons than using the full spectrum of measured values or changes in value. Seeing an earlier VAS form has no apparent influence on later values.
Determining the minimum clinically significant difference in visual analog pain score for children. Powell CV, Kelly AM, Williams A. Ann Emerg Med 2001: 37(1); 28-31. [Medline] STUDY OBJECTIVE: We sought to determine the minimum clinically significant difference in visual analog scale (VAS) pain score for children. METHODS: We performed a prospective, single-group, repeated-measures study of children between 8 and 15 years presenting to an urban pediatric emergency department with acute pain. On presentation to the ED, patients marked the level of their pain on a 100-mm nonhatched VAS scale. At 20-minute intervals thereafter, they were asked to give a verbal categoric rating of their pain as "heaps better," "a bit better," "much the same," "a bit worse," or "heaps worse" and to mark the level of pain on a VAS scale of the same type as used previously. A maximum of 3 comparisons was recorded for each child. The minimum clinically significant difference in VAS pain score was defined as the mean difference between current and preceding scores when the subject reported "a bit worse" or "a bit better" pain. RESULTS: Seventy-three children were enrolled in the study, yielding 103 evaluable comparisons in which pain was rated as "a bit better" or "a bit worse." The minimum clinically significant difference in VAS score was 10 mm (95% confidence interval 7 to 12 mm). CONCLUSION: This study found the minimum clinically significant difference in VAS pain score for children aged 8 to 15 years (on a 100-mm VAS scale) to be 10 mm (95% confidence interval 7 to 12 mm). In studies of populations, differences of less than this amount, even if statistically significant, are unlikely to be of clinical significance.
Clinically significant changes in pain along the visual analog scale. Bird SB, Dickson EW. Ann Emerg Med 2001: 38(6); 639-43. [Medline] STUDY OBJECTIVE: We sought to test the hypothesis that the change in visual analog scale (VAS) associated with a clinically significant change in pain is related to the initial VAS score. METHODS: A convenience sample of adults with isolated extremity trauma was enrolled. A VAS score was obtained on entry into the study. Descriptions of change in pain ("lot less," "little less," "about the same," "little more," or "lot more") and VAS scores were then obtained every 30 minutes until the patient was free of pain or discharged or a total of 2 hours had passed. Patients were divided into 3 cohorts on the basis of the initial VAS score: VAS score of less than 34, VAS score of 34 to 66, and VAS score of 67 or greater. The absolute values of VAS changes associated with pain descriptions of a "little less" or "little more" (defined as clinically significant), "about the same" (defined as clinically insignificant), and "lot less" or "lot more" were calculated. RESULTS: The change in VAS associated with clinically significant changes in pain in the cohort with VAS scores of less than 34 was 13+/-14 (mean+/-SD), which was significantly lower than that of the cohort with VAS scores of 67 or greater (28+/-21). There was no statistically significant difference in clinically significant changes in pain between the middle cohort and either the upper or lower cohorts (P =.07 and P =.29, respectively). There was no significant change in VAS for clinically insignificant changes in pain among the 3 cohorts (3+/-4, 6+/-6, and 8+/-16, respectively). CONCLUSION: Patients with greater pain require a greater change in VAS score to achieve clinically significant pain relief.
Does the clinically significant difference in visual analog scale pain scores vary with gender, age, or cause of pain? Kelly AM. Acad Emerg Med 1998: 5(11); 1086-90. [Medline] OBJECTIVES: To determine the minimum clinically significant difference in visual analog scale (VAS) pain scores for acute pain in the ED setting and to determine whether this difference varies with gender, age, or cause of pain. METHODS: A prospective, descriptive study of 152 adult patients presenting to the ED with acute pain. At presentation and at 20-minute intervals to a maximum of three measurements, patients marked the level of their pain on a 100-mm, nonhatched VAS. At each follow-up they also gave a verbal rating of their pain as "a lot better," "much the same," "a little worse," or "much worse." The minimum clinically significant difference in VAS pain scores was defined as the mean difference between current and preceding scores when pain was reported as a little worse or a little better. Data were compared based on gender, age more than or less than 50 years, and traumatic vs nontraumatic causes of pain. RESULTS: The minimum clinically significant difference in VAS pain scores is 9 mm (95% CI, 6 to 13 mm). There is no statistically significant difference between the minimum clinically significant differences in VAS pain scores based on gender (p=0.172), age (p=0.782), or cause of pain (p=0.84). CONCLUSIONS: The minimum clinically significant difference in VAS pain scores was found to be 9 mm. Differences of less than this amount, even if statistically significant, are unlikely to be of clinical significance. No significant difference in minimum significant VAS scores was found between gender, age, and cause-of-pain groups.
A proposal to use confidence intervals for visual analog scale data for pain measurement to determine clinical significance. Mantha S, Thisted R, Foss J, Ellis JE, Roizen MF. Anesth Analg 1993: 77(5); 1041-7. [Medline] Visual analog scales (VAS) ranging from 0 cm (no pain) to 10 cm (worst imaginable pain) are used widely for pain measurement, but various investigators have not treated these data consistently. Conventional statistical tests of such data, although evaluating the "statistical significance" may obscure the clinical value of a treatment. On the other hand, confidence intervals (CIs) can illuminate both statistical and clinical importance. CIs give a range of values based on the observed data which contain, with a specified probability, a true but unknown variable typifying a population. We reviewed 112 articles published recently in anesthesia journals for statistical reporting of VAS data. Of the 112 articles, only two used CIs to report mean pain scores and one used CIs to report differences in median pain scores between the study groups. Only two articles presented 95% CI for the mean pain scores graphically. Analgesic techniques that produce VAS values in the range of 0-3 have been reported to represent adequate analgesia. A graphical method using CIs is proposed that allows ready interpretation of VAS data. With this approach, one evaluates whether the 95% CI for the mean pain score in a group during a particular period lies entirely within the zone defined as "analgesic success" (0-3). Such an analysis allows a visual assessment of whether a particular technique would produce clinically important effects in the population at large. This approach seems to provide more information than the use of conventional hypothesis testing in the interpretation of VAS data for pain measurement.
A randomized controlled trial of fentanyl for abortion pain. Rawling MJ, Wiebe ER. Am J Obstet Gynecol 2001: 185(1); 103-7. [Medline] OBJECTIVE: Our aim was to find out whether intravenous fentanyl was effective in reducing the pain of first-trimester abortion. STUDY DESIGN: This randomized controlled trial included 825 women attending a nonhospital abortion facility. Some women chose standard care. Women who did not choose standard care were randomly assigned to receive either 50 to 100 microg of fentanyl, a placebo, or no intervention. With SAS software and a mixed effects analysis of variance model with covariates, we compared mean pain scores of the fentanyl and placebo groups to detect a difference of at least 1 point on an 11-point pain scale. RESULTS: The mean pain score of the fentanyl group was 1.0 point less than that of the placebo group (95% confidence interval, 3.7-4.3) and 0.9 point less than that of the observational group (95% confidence interval, 4.7-5.1). This pain reduction was statistically significant, but the women who were studied wanted a 2-point reduction from fentanyl. CONCLUSION: Fentanyl, when compared with the placebo, reduced abortion pain by 1.0 point on an 11-point scale. This reduction was of questionable clinical significance and was less than desired by the women included in the study.
Clinical utility and clinical significance in the assessment and management of pain in vulnerable infants. Stevens B, Gibbins S. Clin Perinatol 2002: 29(3); 459-68. [Medline] Pain in vulnerable populations unable to provide verbal report is challenging in terms of measurement and treatment. Clinicians strive to provide the best possible pain management for infants in the NICU, yet they are often hindered due to paucity of measures that are not only reliable and valid but also clinically useful. Clinical utility of measures is difficult to establish due to a lack of consistent definition of the construct, varied methods of determination, and the secondary importance afforded to this issue in relation to the establishment of reliability and utility. Without clinically useful pain measures, however, clinicians are unable and unlikely to assess the infant's pain or the effectiveness of pain-relieving interventions. Furthermore, even when the clinician is able to assess pain using a valid measure with a minimum of time, cost, and instruction, the clinical significance of any reduction in pain scores needs to be interpreted in terms of the infant and his/her care provider. The issue of defining the extent of change in pain scores that is clinically significant or important remains unclear. Clarity will involve assigning meaning to particular changes in pain scores for vulnerable infants across a broad array of situations and severities of pain. Although research on this topic in children and adults provides some guidance to this dilemma, only through innovative and creative methods will we be able to address these issues.
Minimum clinically significant VAS differences for simultaneous (paired) interval serial pain assessments. Yamamoto LG, Nomura JT, Sato RL, Ahern RM, Snow JL, Kuwaye TT. Am J Emerg Med 2003: 21(3); 176-9. [Medline] [Abstract] We conducted two studies to determine whether the minimum clinically significant difference in the visual analog scale (VAS) for nearly simultaneous and brief-interval serial assessments of pain is less than that for pain assessment at 20- to 30-minute intervals, using a 10-cm VAS. The first study was a blinded, randomized, placebo-controlled paired trial comparing the pain of intravenous cannulation in both hands (20-minute application of a eutectic mixture of local anesthetics v placebo) of study subjects. The second study was a non-blinded, randomized, paired trial of different treatments for jellyfish stings. In the first study, 37 of 40 subjects indicated that one hand experienced more pain than the other. Eleven of these 37 subjects (30%) indicated differences in VAS values of 1.0 cm or less, with a minimum value of 0.5 cm. In the second study, for all the VAS-based pain comparisons, VAS differences of </=0.5 cm (other than zero) occurred 183 times, and in 171 of these instances (93%) subjects were able to recognize that there was a difference. On the basis of these findings, the minimum clinically significant VAS difference for paired comparisons that are simultaneous or occur within 5 minutes of each other is about 0.5 cm or less. This value is less than the 1.3-cm value determined for serial 20- to 30-minute pain comparisons. It is likely that other types of pain comparisons may have different minimum clinically significant VAS differences.
Is it clinically significant? Erill S. Lancet 2002: 359(9318); 1708. [Full text] [PDF] [Excerpt] We worship statistics. No wonder. After all, the aim of clinical research seems increasingly to be centred on the detection of small differences and what seems to count is the statistical significance of what is found. A paper well seasoned with probability values lower than the customary 5% might seem respectable whatever the size of the difference detected or even the relevance of the variable under study. So much for statistical significance. It is well served, but what we need are tests of clinical significance and, believe me, they exist even if they are not usually found in textbooks.
Efficacy, safety, and cost of new anticancer drugs. Garattini S, Bertele V. British Medical Journal 2002: 325(7358); 269-71. [Medline] [Full text] [PDF]
Clinical versus statistical considerations in the design and analysis of clinical research. Horwitz RI, Singer BH, Makuch RW, Viscoli CM. J Clin Epidemiol 1998: 51(4); 305-7. [Medline]
Why randomized controlled trials fail but needn't: 2. Failure to employ physiological statistics, or the only formula a clinician-trialist is ever likely to need (or understand!). Sackett DL. Cmaj 2001: 165(9); 1226-37. [Medline] [Full text] [PDF]
Negative results of randomized clinical trials published in the surgical literature: equivalency or error? Dimick JB, Diener-West M, Lipsett PA. Arch Surg 2001: 136(7); 796-800. [Medline] HYPOTHESIS: We hypothesized that review of randomized controlled clinical trials (RCTs) with nonstatistically significant or "negative" results published in the surgical literature do not have appropriate statistical power to demonstrate equivalency between treatment arms. DATA SOURCES AND STUDY SELECTION: The MEDLINE database was searched to obtain reports of all RCTs with negative results published in 3 surgical journals from 1988 to 1998. Manual review of one year (1997) of publications for each journal was performed to validate our search strategy. Equivalency was evaluated using the Two One-Sided Tests Procedure and post hoc power calculations. DATA SYNTHESIS: Ninety reports of RCTs with negative results were identified in the surgical literature between 1988 and 1998. The manual review of 1997 showed a 100% retrieval rate for our search strategy. After applying the Two One-Sided Tests Procedure, 35 reports (39%) met the criteria for demonstrating equivalency. The other 55 reports (61%) contained at least a 10% absolute difference in the 90% confidence interval of Delta. Using the power calculation method, only 22 (24%) articles had a power greater than.80 to detect a 50% difference in therapeutic effect. Only 29% of the reports included a formal sample size calculation and these studies were more likely to demonstrate equivalency than those without a sample size estimate (P<.01). CONCLUSIONS: Many reports from negative RCTs published in the surgical literature lack sufficient statistical power to establish that clinically important differences are not present. Surgeons should perform appropriate sample size calculations when designing RCTs and recognize the utility of confidence intervals when reporting negative results.
Survey of claims of no effect in abstracts of Cochrane reviews. Alderson P, Chalmers I. Bmj 2003: 326(7387); 475. [Medline] [Full text] [PDF]
Absence of evidence is not evidence of absence. Altman DG, Bland JM. British Medical Journal 1995: 311(7003); 485. [Medline] [Full text] PIP: Randomized controlled clinical trials are conducted to determine whether differences of clinical importance exist between selected treatment regimens. When statistical analysis of the study data finds a P value greater than 5%, it is convention to deem the assessed difference nonsignificant. Just because convention dictates that such study findings be termed nonsignificant, or negative, however, it does not necessarily follow that the study found nothing of clinical importance. Subject samples used in controlled trials tend to be too small. The studies therefore lack the necessary power to detect real, and clinically worthwhile, differences in treatment. Freiman et al. found that only 30% of a sample of 71 trials published in the New England Journal of Medicine in 1978-79 with a P value greater than 10% were large enough to have a 90% chance of detecting even a 50% difference in the effectiveness of the treatments being compared, and they found no improvement in a similar sample of trials published in 1988. It is therefore wrong and unwise to interpret so many negative trials as providing evidence of the ineffectiveness of new treatments. One must instead seriously question whether the absence of evidence is a valid justification for inaction. Efforts must be made to look for quantification of an association rather than just a P value, especially when the risks under investigation are small. The authors cite a recent trial comparing octreotide and sclerotherapy in patients with variceal bleeding, as well as the overview of clinical trials evaluating fibrinolytic treatment for preventing reinfarction after acute myocardial infarction as examples.
Underpowered clinical trials of antiretroviral treatment. Arribas JR, Pulido F. Jama 2002: 288(17); 2120; author reply 2120-1. [Medline]
The prevalence of negative studies with inadequate statistical power: an analysis of the plastic surgery literature. Chung KC, Kalliainen LK, Spilson SV, Walters MR, Kim HM. Plast Reconstr Surg 2002: 109(1); 1-6; discussion 7-8. [Medline] Studies published in the medical literature often neglect to consider the statistical power needed to detect a meaningful difference between study groups. Small sample sizes tend to produce negative results because of low statistical power. Studies that cannot make conclusive statements about their hypotheses can waste resources, deter further research, and impede advances in clinical treatment. The current study reviewed three of the most frequently read plastic surgery journals from 1976 to 1996 to determine the prevalence of inadequately (<80 percent) powered clinical trials and experimental studies that found no difference (negative studies) in the response variable of interest between comparison groups. The statistical power of 54 negative studies using continuous response variables was calculated to detect a difference of 1 SD (+/-1 SD) in means between the comparative groups. The power of another 57 negative studies with dichotomous response (yes/no) variables was calculated to detect a relative change in proportions of 25 percent and 50 percent from the experimental to the control group. It was found that 85 percent of the studies with continuous response variables had inadequate power to detect the desired mean difference of +/-1 SD. In studies with dichotomous response variables, 98 percent had inadequate power to detect a desired 25 percent relative change in proportions, and 74 percent had inadequate power to detect a desired 50 percent relative change in proportions. These results indicate that many of the studies in the plastic surgery literature lack adequate power to detect a moderate-to-large difference between groups. The lack of power makes the interpretation of the studies with negative findings inconclusive. Proper study design dictates that investigators consider a priori the difference between groups that is of clinical interest, and the sample size per group that is needed to provide adequate statistical power to detect the desired difference.
Putting trials on trial--the costs and consequences of small trials in depression: a systematic review of methodology. Hotopf M, Lewis G, Normand C. J Epidemiol Community Health 1997: 51(4); 354-8. STUDY OBJECTIVE: To determine why, despite 122 randomised controlled trials, there is no consensus about whether the selective serotonin reuptake inhibitors or tricyclic and related antidepressants should be used as first line treatment of depression. DESIGN: Systematic review of all RCTs comparing selective serotonin reuptake inhibitors and tricyclic or heterocyclic antidepressants. MAIN RESULTS: The shortcomings identified in the 122 trials were as follows: (1) there was inadequate description of randomisation, (2) the outcomes used were mainly observer rated measurements of depression, and studies failed to use quality of life measures or perform economic evaluations, (3) doses of tricyclic antidepressants were inadequate, (4) generalisability of studies was poor (including a reliance on secondary care settings and inadequate follow up), and (5) there were statistical shortcomings such as low statistical power, failure to use intention to treat analyses, and the tendency to make multiple comparisons. CONCLUSIONS: Future RCTs should be designed to inform policy makers and address these methodological shortcomings.
"Evidence of absence" can be important. Joffe M. Bmj 2003: 326(7401); 1267. [Medline] [Full text]
Epidemiological appraisal of studies of residential exposure to power frequency magnetic fields and adult cancers. Li CY, Theriault G, Lin RS. Occup Environ Med 1996: 53(8); 505-10. [Medline] OBJECTIVES: To appraise epidemiological evidence of the purported association between residential exposure to power frequency magnetic fields and adult cancers. METHODS: Literature review and epidemiological evaluation. RESULTS: Seven epidemiological studies have been conducted on the risk of cancer among adults in relation to residential exposure to power frequency magnetic fields. Leukaemia was positively associated with magnetic fields in three case-control studies. The other two case-control studies and two cohort studies did not show such a link. Brain tumours and breast cancer have rarely been examined by these studies. Based on the epidemiological results, the analysis of the role of chance and bias, and the criteria for causal inferences, it seems that the evidence is not strong enough to support the putative causal relation between residential exposure to magnetic fields and adult leukaemia, brain tumours, or breast cancer. Inadequate statistical power is far more a concern than selection bias, information bias, and confounding in interpreting the results from these studies, and in explaining inconsistencies between studies. CONCLUSIONS: Our reviews suggested that the only way to answer whether residential exposure to magnetic fields is capable of increasing the risks of adult cancers is to conduct more studies carefully avoiding methodological flaws, in particular small sample size. We also suggested that the risk of female breast cancer should be the object of additional investigations, and that future studies should attempt to include information on exposure to magnetic fields from workplaces as well as residential exposure to estimate the effects of overall exposure to magnetic fields.
Thirst, interdialytic weight gain, and thirst-interventions in hemodialysis patients: a literature review. Mistiaen P. Nephrol Nurs J 2001: 28(6); 601-4, 610-3; quiz 614-5. [Medline] A literature search completed over the period of 1980-1999 identified studies on the prevalence of thirst in hemodialysis (HD) patients and the relationship between thirst and interdialytic weight gain, as well as intervention studies in which thirst was used as an outcome variable. Twenty-three studies fulfilled the selection criteria and were included in the analysis. The prevalence of thirst varied between 6% and 95% across studies. In most studies more thirst was related to more weight gain. However, the studies were difficult to compare due to methodological differences. Three types of interventions were found: technical interventions in the dialysis mechanisms (increasing the frequency of dialysis sessions and varying the concentration of sodium in the dialysate), pharmaceutical interventions (ACE-inhibitors), and a dietetic intervention. Almost no conclusions could be drawn with regard to the effectiveness of these interventions due to methodological differences and weaknesses and due to the small sample sizes.
MR findings in humeral epicondylitis. A systematic review. Pasternack I, Tuovinen EM, Lohman M, Vehmas T, Malmivaara A. Acta Radiol 2001: 42(5); 434-40. [Medline] PURPOSE: To highlight the importance of meta-analysis in diagnostic imaging by presenting a systematic search of the literature on the accuracy of MR imaging in epicondylitis. MATERIAL AND METHODS: The literature was comprehensively reviewed to identify studies on MR findings in epicondylitis. Reviewers blind to the clinical diagnoses screened the data according to predetermined inclusion criteria. Data were collected and validity and relevance were assessed on structured forms. RESULTS: Seven studies including 148 patients with epicondylitis were accepted for the analysis. Eleven asymptomatic contralateral elbows and 29 elbows of healthy volunteers served as controls. The volunteers were distinctly younger than the patients. The MR technique was divergent, and the observed pathological changes also varied. The most frequent alteration was a change in the common extensor tendon signal (90%, 95% confidence interval 84-94%); 14% of the healthy volunteers and 50% of the contralateral elbows displayed the similar alteration. CONCLUSION: Small sample size and methodological shortcomings in the original studies make the assessment of MR findings in epicondylitis questionable. There is a need for well-designed studies in which clinical features and occupational backgrounds as well as imaging parameters are carefully documented.
The ethics of tiny trials. Phillips B. Arch Dis Child 2002: 87(3); 258. [Medline] Abstract not available yet.
Distinguishing between "no evidence of effect" and "evidence of no effect" in randomised controlled trials and other comparisons. Tarnow-Mordi WO, Healy M. Arch Dis Child 1999: 80(3); 210-213. [Full text] [PDF]
Cost effectiveness calculations and sample size. Torgerson DJ, Campbell MK. BMJ 2000: 321; 697. [Full text] [PDF]
Elevated blood lead levels in children of construction workers. Whelan E, Piacitelli G, Gerwel B, Schnorr T, Mueller C, Gittleman J, Matte T. American Journal of Public Health 1997: 87(8); 1352-55. ABSTRACT: OBJECTIVES: This study examined whether children of lead-exposed construction workers had higher blood lead levels than neighborhood control children. METHODS: Twenty-nine construction workers were identified from the New Jersey Adult Blood Lead Epidemiology and Surveillance (ABLES) registry. Eighteen control families were referred by workers. Venous blood samples were collected from 50 children (31 exposed, 19 control subjects) under age 6. RESULTS: Twenty-six percent of workers children had blood lead levels at or over the Centers for Disease Control and Prevention action level of 0.48 mumol/L (10 micrograms/dL), compared with 5% of control children (unadjusted odds ratio = 6.1; 95% confidence interval = 0.9, 147.2). CONCLUSIONS: Children of construction workers may be at risk for excessive lead exposure. Health care providers should assess parental occupation as a possible pathway for lead exposure of young children.
What is the chance that this study is clinically significant? A proposal for Q values. Froehlich GW. Eff Clin Pract 1999: 2(5); 234-9. [Medline] [Full text] CONTEXT: Clinicians who use the medical literature to guide their practice need to make judgments about the clinical significance of medical interventions. GENERAL QUESTION: How likely is an intervention to be clinically worthwhile? SPECIFIC RESEARCH CHALLENGE: Given the results of a study, determining the probability that the true effect of an intervention is at least as great as some minimum worthwhile effect. CURRENT APPROACH: P values are widely used to convey the probability of observed effects arising by chance if there truly is no effect. By convention, P values less than 0.05 are interpreted as being "statistically significant." POTENTIAL DIFFICULTIES: Statistical significance is often confused with clinical significance. ALTERNATE APPROACH: A different probability could be reported, a probability I call a Q value. A Q value is the probability that the true effect of an intervention is at least as great as some minimum worthwhile effect. Q values are calculated in a manner analogous to that used for P values, except that the null hypothesis becomes a minimum worthwhile effect instead of no effect. Q values encourage researchers and clinicians to be explicit about what they think a worthwhile effect is and could help shift the focus of study interpretation away from arbitrary statistical conventions.
Effect of homoeopathy on pain and other events after acute trauma: placebo controlled trial with bilateral oral surgery. Lokken P, Straumsheim PA, Tveiten D, Skjelbred P, Borchgrevink CF. British Medical Journal 1995: 310(6992); 1439-42. [Medline] [Abstract] [Full text] OBJECTIVE--To examine whether homoeopathy has any effect on pain and other inflammatory events after surgery. DESIGN--Randomised double blind, placebo controlled crossover trial with "identical" oral surgical procedures performed on two separate occasions in 24 patients. INTERVENTIONS--Treatment started 3 hours after surgery with either homoeopathy or placebo. MAIN OUTCOME MEASURES--Postoperative pain and preference for postoperative course assessed by patients on visual analogue scales. Measurements of postoperative swelling and reduction in ability to open mouth. Assessment of bleeding after surgery. RESULTS--Pain after surgery was essentially the same whether treated with homoeopathy or placebo. Postoperative swelling was not significantly affected by homoeopathy, but treatment tended to give less reduction in ability to open mouth. No noticeable difference was seen in postoperative bleeding, side effects, or complaints. Thirteen of the 24 patients preferred the postoperative course with placebo. CONCLUSIONS--No positive evidence was found for efficacy of homoeopathic treatment on pain and other inflammatory events after an acute soft tissue and bone injury inflicted by a surgical intervention. Differences in the order of 30% to 40% would have been needed to show significant effects.
Interventions for promoting smoking cessation during pregnancy. Lumley J, Oliver S, Waters E. Cochrane Database Syst Rev 2000: (2); CD001055. [Abstract] BACKGROUND: Smoking remains one of the few potentially preventable factors associated with low birthweight, very preterm birth and perinatal death. OBJECTIVES: The objective of this review was to assess the effects of smoking cessation programs implemented during pregnancy on the health of the fetus and infant, on the mother and on the family. SEARCH STRATEGY: We searched the Cochrane Pregnancy and Childbirth Group trials register and the Cochrane Tobacco Addiction Group trials register. SELECTION CRITERIA: Randomised and quasi-randomised trials of smoking cessation programs implemented during pregnancy. DATA COLLECTION AND ANALYSIS: Trial quality was assessed and data were extracted independently by two reviewers. MAIN RESULTS: Forty-four trials were identified: 37 trials including 16,916 women provided data on smoking cessation and/or perinatal outcomes, as did one cluster-randomised trial including 3000 women. Over 800 women were included in trials of smoking relapse prevention. There was substantial variation in the intensity of the intervention and the extent of reminders and reinforcement through pregnancy. Based on 34 trials there was a significant reduction in smoking in the intervention groups (odds ratio 0.53, 95% confidence interval 0. 47 to 0.60), an absolute difference of 6.4% women continuing to smoke. The eight trials with validated smoking cessation, a high intensity intervention and a high quality score had an odds ratio of 0.53, 95% confidence interval 0.44 to 0.63 and an absolute difference in continued smoking of 8.1%. The subset of trials with information on fetal outcome revealed a reduction in low birthweight (odds ratio 0.80, 95% confidence interval 0.67 to 0.95), a reduction in preterm birth (odds ratio 0.83, 95% confidence interval 0.69 to 0. 99) and an increase in mean birthweight of 28g (95% confidence interval 9 to 49). There were no differences in very low birthweight or perinatal mortality. Five trials of smoking relapse prevention showed no significant difference. The single large cluster-randomised trial showed no evidence of a decrease in continued smoking or adjusted mean birthweight. REVIEWER'S CONCLUSIONS: Smoking cessation programs in pregnancy appear to reduce smoking, low birthweight and preterm birth, but no effect was detected for very low birthweight or perinatal mortality.
The association of nonsteroidal anti-inflammatory drugs with upper gastrointestinal tract bleeding. Carson JL, Strom BL, Soper KA, West SL, Morse ML. Arch Intern Med 1987: 147(1); 85-8. [Medline] To evaluate the risk of developing upper gastrointestinal (UGI) bleeding from nonsteroidal anti-inflammatory drugs (NSAIDs), a retrospective (historical) cohort study was performed, using a computerized data base including 1980 billing data from all Medicaid patients in the states of Michigan and Minnesota. Comparing 47,136 exposed patients to 44,634 unexposed patients, the unadjusted relative risk for developing UGI bleeding 30 days after exposure to a NSAID was 1.5 (95% confidence interval 1.2 to 2.0). Univariate analyses demonstrated associations between UGI bleeding and age, sex, state, alcohol-related diagnoses, preexisting abdominal conditions, and use of anticoagulants. This association between NSAIDs and UGI bleeding was unchanged after adjusting for these potential confounding variables using logistic regression. A linear dose-response relationship and a quadratic duration-response relationship were demonstrated. Non-steroidal anti-inflammatory drugs are associated with UGI bleeding, although the magnitude of the increased risk is reassuringly small.
Grapefruits and drugs: when is statistically significant clinically significant? Abernethy DR. J Clin Invest 1997: 99(10); 2297-8. [Medline] [Full text] [PDF]
Size and quality of randomised controlled trials in head injury: review of published studies. Dickinson K, Bunn F, Wentz R, Edwards P, Roberts I. British Medical Journal 2000: 320; 1308-1311. [Medline] [Abstract] [Full text] [PDF] Objective: To assess whether trials in head injury are large enough to avoid moderate random errors and designed to avoid moderate biases. Design: All randomised controlled trials on the treatment and rehabilitation of patients with head injury published before December 1998 were surveyed. Trials were identified from electronic databases, by hand searching journals and conference proceedings, and by contacting researchers. Data were extracted on the number of participants, quality of concealment of allocation, use of blinding, loss to follow up, and types of participants, interventions, and outcome measures. Results: 279 reports were identified, containing information on 208 separate trials. The average number of participants per trial was 82, with no evidence of increasing size over time. The total number of randomised participants in the 203 trials in which size was reported was 16 613. No trials were large enough to detect reliably a 5% absolute reduction in the risk of death or disability, and only 4% were large enough to detect an absolute reduction of 10%. Concealment of allocation was adequate in 22 and inadequate or unclear in 25 of the 47 (23%) in which it was reported. Of 126 trials assessing disability, 111 reported the number of patients followed up, and average loss to follow up was 19%. Of trials measuring disability, 26 (21%) reported that outcome assessors were blinded. Conclusions: Randomised trials in head injury are too small and poorly designed to detect or refute reliably moderate but clinically important benefits or hazards of treatment. Limited funding for injury research and unfamiliarity with issues of consent may have been important obstacles.
Quality of randomised controlled trials in head injury. Trials in head injury are more complex than review suggests. Murray GD, Teasdale GM. British Medical Journal 2000: 321(7270); 1223. [Medline] [Full text]
Assessing clinically significant change: application to the SCL-90-R. Schmitz N, Hartkamp N, Franke GH. Psychol Rep 2000: 86(1); 263-74. [Medline] A Symptom Checklist (SCL-90-R) is a potentially useful measure of psychological distress; it is frequently used in psychotherapy research and clinical practice. The purpose of this study was to illustrate the use of the SCL-90-R for determining statistically reliable change and clinical significance outlined by Jacobson and Truax in 1991. This paper describes the concepts of statistical and clinical significance of change. A proposal for obtaining and characterizing samples is made. Then a clinician's perspective is taken. Reliable change estimates and cut-off scores are chosen based on outcome data. Selected data from a single psychotherapeutic process and outcome study then were used to test the estimates of change and cut-off scores.
The Crack Baby Epidemic That Wasn't. What Statistics Mean, and Don't Mean. Schwartzberg NS. Accessed on 2005-03-11 (link broken). Statistics form the basis of scientific findings. While researchers are responsible for experimental design and quantification, journalists must understand the limitations of statistical methods. Reporters need to provide real world context to the results and differentiate between significant and meaningful differences. www.biomednet.com/hmsbeagle/50/people/op_ed.htm
Multiple doses of secretin in the treatment of autism: a controlled study. Sponheim E, Oftedal G, Helverschou SB. Acta Paediatr 2002: 91(5); 540-5. [Medline] Dramatic effects on autistic behaviour after repeated injections of the gastrointestinal hormone secretin have been referred in a number of case reports. In the absence of curative and effective treatments for this disabling condition, this information has created new hope among parents. Although controlled studies on the effect of mainly one single dose have not documented any effect, many children still continue to receive secretin. Six children enrolled in a double-blind, placebo-controlled crossover study in which each child was its own control. Human synthetic secretin, mean dose 3.4 clinical units, and placebo were administered intravenously in randomized order every 4th wk, on three occasions each. The measurement instruments were the visual analogue scale (VAS) and the aberrant behaviour checklist (ABC). Statistically significant differences were found for placebo in 3 out of 6 children and for secretin in one child, using parental ratings only (VAS scores). Differences were small and lacked clinical significance, which was in accordance with the overall impression of the parents and teachers and visual inspection of graphs. Conclusion: In this placebo-controlled study, multiple doses of secretin did not produce any symptomatic improvement.
What is the relationship between the minimally important difference and health state utility values? The case of the SF-6D. Walters SJ, Brazier JE. Health Qual Life Outcomes 2003: 1(1); 4. [Medline] BACKGROUND: The SF-6D is a new single summary preference-based measure of health derived from the SF-36. Empirical work is required to determine what is the smallest change in SF-6D scores that can be regarded as important and meaningful for health professionals, patients and other stakeholders. OBJECTIVES: To use anchor-based methods to determine the minimally important difference (MID) for the SF-6D for various datasets. METHODS: All responders to the original SF-36 questionnaire can be assigned an SF-6D score provided the 11 items used in the SF-6D have been completed. The SF-6D can be regarded as a continuous outcome scored on a 0.29 to 1.00 scale, with 1.00 indicating "full health".Anchor-based methods examine the relationship between an health-related quality of life (HRQoL) measure and an independent measure (or anchor) to elucidate the meaning of a particular degree of change. One anchor-based approach uses an estimate of the MID, the difference in the QoL scale corresponding to a self-reported small but important change on a global scale. Patients were followed for a period of time, then asked, using question 2 of the SF-36 as our global rating scale, (which is not part of the SF-6D), if there general health is much better (5), somewhat better (4), stayed the same (3), somewhat worse (2) or much worse (1) compared to the last time they were assessed. We considered patients whose global rating score was 4 or 2 as having experienced some change equivalent to the MID. In patients who reported a worsening of health (global change of 1 or 2) the sign of the change in the SF-6D score was reversed (i.e. multiplied by minus one). The MID was then taken as the mean change on the SF-6D scale of the patients who scored (2 or 4). RESULTS: This paper describes the MID for the SF-6D from seven longitudinal studies that had previously used the SF-36. CONCLUSIONS: From the seven reviewed studies (with nine patient groups) the MID for the SF-6D ranged from 0.010 to 0.048, with a weighted mean estimate of 0.033 (95% CI: 0.029 to 0.037). The corresponding Standardised Response Means (SRMs) ranged from 0.11 to 0.48, with a mean of 0.30 and were mainly in the "small to moderate" range using Cohen's criteria, supporting the MID results. Using the half-standard deviation (of change) approach the mean effect size was 0.051 (range 0.033 to 0.066). Further empirical work is required to see whether or not this holds true for other patient groups and populations.
The meaning of 6.8: numeracy and normality in health information talks. Adelsward V, Sachs L. Soc Sci Med 1996: 43(8); 1179-87. [Medline] The ambiguities of risk which stem from its translation from epidemiological findings into clinical knowledge and practice and thus to lay experiences of health and illness is a clear dilemma. How are risks expressed statistically, or otherwise mathematically, to be interpreted and communicated within the discourse of medico-science, and how within the discourse of an individual's everyday life? An important tool in all risk discourses and in preventive practices such as health information is testing and test results. Test results--presented in mathematical terms as points on a scale, or as a number--are in fact fundamental to preventive practice. But what do we know about how people involved in these tests understand them and how the results are used in the construction of ideas about risk and normalcy? This article attempts to answer part of that question by drawing on an empirical study of the use of numbers as metaphors in talks between a nurse and her potential patients in a directed health survey.
Completeness of reporting trial results: effect on physicians' willingness to prescribe. Bobbio M, Demichelis B, Giustetto G. Lancet 1994: 343(8907); 1209-11. [Medline] Clinical trials may lead to conflicting results. We studied how different ways of reporting results affected physicians' recommendations. A questionnaire distributed to 148 general practitioners presented results of a clinical trial where a reduction of cardiac events and an increase of mortality was reported. Results were shown in four different ways--relative risk reduction, absolute risk reduction, percentages of event-free patients, number needing to be treated to prevent an event--as if they derived from different trials. A fifth presentation was the reduced rate of cardiac events along with the increased rate of mortality. Physicians were asked to estimate how much they would be willing to prescribe each drug. The mean agreement of physicians' decisions was 77 (28)% for relative risk reduction, 24 (28)% for absolute risk reduction, 37 (37)% for different percentages event-free patients, 34 (34)% for number need to treat, and 23 (28)% for events reduction and mortality for increase (p < 0.001 relative risk vs others). The method of reporting trial results and the completeness of information in the case of controversial results affects physicians willingness to prescribe.
General practice registrar responses to the use of different risk communication tools in simulated consultations: a focus group study. Edwards A. British Medical Journal 1999: 319(7212); 749-752. ABSTRACT: OBJECTIVES: To pilot the use of a range of complementary risk communication tools in simulated general practice consultations; to gauge the responses of general practitioners in training to these new consultation aids. DESIGN: Qualitative study based on focus group discussions. SETTING: General practice vocational training schemes in South Wales. PARTICIPANTS: 39 general practice registrars and eight course organisers attended four sessions; three simulated patients attended each time. METHOD: Registrars consulting with simulated patients used verbal or "qualitative" descriptions of risks, then numerical data, and finally graphical presentations of the same data. Responses of doctors and patients were explored by semistructured discussions that had been audiotaped for transcription and analysis. RESULTS: The process of using risk communication tools in simulated consultations was acceptable to general practitioner registrars. Providing doctors with information about risks and benefits of treatment options was generally well received. Both doctors and patients found it helped communication. There were concerns about the lack of available, unbiased, and applicable evidence and a shortage of time in the consultation to discuss treatment options adequately. Graphical presentation of information was often favoured-an approach that also has the potential to save consultation time. CONCLUSIONS: A range of risk communication "tools" with which to discuss treatment options is likely to be more applicable than a single new strategy. These tools should include both absolute and relative risk information formats, presented in an unbiased way. Using risk communication tools in simulated consultations provides a model for training in risk communication for professional groups.
Explaining risks: turning numerical data into meaningful pictures. Edwards A, Elwyn G, Mulley A. Bmj 2002: 324(7341); 827-30. [Medline] [Full text] [PDF]
Evidence based purchasing: understanding results of clinical trials and systematic reviews. Fahey T, Griffiths S, Peters TJ. British Medical Journal 1995: 311(7012); 1056-9; discussion 1059-60. [Medline] [Abstract] [Full text] OBJECTIVE--To assess whether the way in which the results of a randomised controlled trial and a systematic review are presented influences health policy decisions. DESIGN--A postal questionnaire to all members of a health authority within one regional health authority. SETTING--Anglia and Oxford regional health authorities. SUBJECTS--182 executive and non-executive members of 13 health authorities, family health services authorities, or health commissions. MAIN OUTCOME MEASURES--The average score from all health authority members in terms of their willingness to fund a mammography programme or cardiac rehabilitation programme according to four different ways of presenting the same results of research evidence--namely, as a relative risk reduction, absolute risk reduction, proportion of event free patients, or as the number of patients needed to be treated to prevent an adverse event. RESULTS--The willingness to fund either programme was significantly influenced by the way in which data were presented. Results of both programmes when expressed as relative risk reductions produced significantly higher scores when compared with other methods (P < 0.05). The difference was more extreme for mammography, for which the outcome condition is rarer. CONCLUSIONS--The method of reporting trial results has a considerable influence on the health policy decisions made by health authority members.
Absolutely relative: how research results are summarized can affect treatment decisions. Forrow L, Taylor W, Arnold R. The American Journal of Medicine 1992: 92(2); 121-24. ABSTRACT: PURPOSE: To determine whether alternative methods of presenting a contrast between the same two quantities in descriptions of research results could lead to different treatment decisions by physicians. SUBJECTS AND METHODS: We conducted a survey of practicing physicians and of faculty and fellows in training programs in clinical epidemiology and social science research methods. Each questionnaire presented results from a published study of either hypertension or hypercholesterolemia in two different ways: once as the relative change in the outcome rate and once as the absolute change in the outcome rate. We asked respondents to read each summary and indicate how the information contained in the summary would influence decisions about treatment. RESULTS: Of the 235 physicians who completed the questionnaire, 108 (46%) gave different responses to the same results presented in different ways. Of these, 97 (89.8%) indicated a stronger inclination to treat patients after reading of the relative change in the outcome rate (p less than 0.0001). CONCLUSION: The manner of presentation of results can influence physicians' judgments about the treatment of patients.
Communicating the benefits of chronic preventive therapy: does the format of efficacy data determine patients' acceptance of treatment? Hux J, Naylor C. Medical Decision Making 1995: 15(2); 152-7. ABSTRACT: Patients' informed acceptance of chronic medical therapy hinges on communicating the potential benefits of drugs in quantitative terms. In a hypothetical scenario of treatment initiation, the authors assessed how three different formats of the same data affected the willingness of 100 outpatients to take what were implied to be three different lipid-lowering drugs. Side-effects were declared negligible and costs insured. Subjects make a "yes-no" decision about taking such a medication, and graded the decision on a certainty scale. Advised of a relative risk reduction--"34% reduction in heart attacks"--88% of the patients assented to therapy. All other formats elicited significantly more refusals (p < 0.0001): for absolute risk difference--"1.4% fewer patients had heart attacks"--42% assented; for inverted absolute risk--"treat 71 persons for 5 years to prevent one heart attack"--only 31% accepted treatment. When the data were extrapolated to disease-free survival--"average gain of 15 weeks"--40% consented. Similar responses were obtained for descriptions of an antihypertensive drug: 89% assented to therapy when given relative risk reduction but only 46% when given absolute risk reduction. The subjects were confident in both acceptance and refusal: 93% of the decisions were rated "somewhat certain" to "completely certain." The authors conclude that patients' views of medical therapy are shaped by the formats in which potential benefits are presented. Multiple complementary formats may be most appropriate. The results imply that many patients may decline treatment if briefed on the likelihood or extent of benefit.
Absolute and relative truth in clinical trials. Julian D. Lancet 2002(June): 359(9321); 1945-1946. Abstract not available.
Quality of life questionnaires: does statistically significant = clinically important? Juniper EF. J Allergy Clin Immunol 1998: 102(1); 16-7. [Medline] [Full text] [PDF]
An assessment of clinically useful measures of the consequences of treatment. Laupacis A, Sackett D, Roberts R. New England Journal of Med 1988: 318(26); 1728-1733. [Medline]
Consider absolute risks in SIDS prevention. Logan S. Arch Dis Child 2000: 83(5); 457. Abstract not available yet.
Who benefits from medical interventions? Smith GD, Egger M. Bmj 1994: 308(6921); 72-4. [Medline] [Full text] Abstract not available.
"Absolute" is inappropriate for quantitative risk estimation. Tunstall-Pedoe H. BMJ 2000: 320(7236); 723-. [Full text]
Interpreting treatment effects in randomised trials. Guyatt GH, Juniper E, Walter S, Griffith L, Goldstein R. British Medical Journal 1998: 316(7132); 690-693. [Medline] [Full text] [PDF] [Excerpt] The need to measure the impact of treatments on health related quality of life has led to a rapid increase in the variety of instruments available and in their use as measures of outcome in clinical trials. One limitation of instruments that purport to measure health related quality of life is difficulty interpreting their results. In the past decade, investigators have progressed in making these questionnaire results interpretable. For example, we have shown that when questionnaires present response options in the form of seven point scales with verbal descriptions for each option (see box), the smallest difference that patients consider important is often approximately 0.5 per question. A moderate difference corresponds to a change of approximately 1.0 per question, and changes of greater than 1.5 can be considered large. Thus, for example, in a domain with four items, patients will consider a 1 point change in two or more items as important. This finding applies across different areas of function, including dyspnoea, fatigue, and emotional function in patients with chronic airflow limitation1; and symptoms, emotional function, and activity limitations in adults2 and children3 with asthma, parents of children with asthma,4 and adults with rhinoconjunctivitis.5 Initially, we used comparisons in the same patient to establish this difference, but more recently we have replicated this finding using differences between patients.
Can there be a more patient-centred approach to determining clinically important effect sizes for randomized treatment trials? Naylor CD. J Clin Epidemiol 1994: 47(7); 787-95. [Medline] Sample sizes for treatment trials with categorical outcomes are conventionally derived by balancing three elements: a difference between alternative treatments in the event rates for the outcomes of interest (commonly termed the clinically important difference), the alpha error tolerance (false positive risk) and the beta error tolerance (false negative risk). Clinically important differences used to plan trials are chosen in part based on earlier experience with similar interventions (i.e. biological or clinical plausibility). Methodological conventions and clinicians' perceptions will also affect choices. Lastly, practical concerns about the feasibility of accruing large numbers of subjects may drive trialists to specify bigger differences as clinically important, with a view to containing sample size requirements. We suggest that patients or other members of the public be given an active role in determining the magnitude of the clinically important treatment effect for trial planning. Probability trade-offs could be constructed to enable patients and/or healthy volunteers to indicate the degree of benefit they would want from a "new" treatment, given the potential side-effects of the same treatment. This method has the advantage of respecting patient autonomy and principles of informed consent. It provides an additional consideration when plausible effect sizes and error tolerances on hypothesis tests are balanced against feasibility of accruing various sample sizes. Its primary disadvantage is inconvenience, as it adds another step to trial design. On the other hand, if patient-based clinically important differences are generated for a variety of disease states and types of treatments, specific trade-off exercises may be needed only for unusual trials.(ABSTRACT TRUNCATED AT 250 WORDS)
Measurement of Fatigue Determining Minimally Important Cllinical Differences. Schwartz AL, Meek PM, Nail LM, Fargo J, Lundquist M, Donofrio M, Grainger M, Throckmorton T, Mateo M. Journal of Clinical Epidemiology 2002: 55(3); 239 - 244. [Medline]
Measurement of fatigue. determining minimally important clinical differences. Schwartz AL, Meek PM, Nail LM, Fargo J, Lundquist M, Donofrio M, Grainger M, Throckmorton T, Mateo M. J Clin Epidemiol 2002: 55(3); 239-44. [Medline] The purpose was to determine the minimally important clinical difference (MICD) in fatigue as measured by the Profile of Mood States, Schwartz Cancer Fatigue Scale (SCFS), General Fatigue Scale, and a 10-point single-item fatigue measure. The MICD is the smallest amount of change in a symptom (e.g., fatigue) measure that signifies an important change in that symptom. Subjects rated the degree of change in their fatigue over 2 days on a Global Rating Scale. 103 patients were enrolled on this multisite prospective repeated measures design. MICD was determined following established procedures at two time points. Statistically significant changes were observed for moderate and large changes in fatigue, but not for small changes. The scales were sensitive to increases in fatigue over time. The MICD, presented as mean change, for each scale and per item on each scale is: POMS = 5.6, per item = 1.1, SCFS = 5.0, per item = 0.8, GFS = 9.7, per item = 1.0, and the single item measure of fatigue was 2.4 points. This information may be useful in interpreting scale scores and planning studies using these measures.
Here are some results that may or may not be important.
Traumatic Brain Injury: Patterns of Failure of Nonoperative Management. Patel NY. The Journal of Trauma 2000: 48(3); 367-373. [Medline] ABSTRACT: OBJECTIVE: The circumstances of failure for nonoperative management of blunt traumatic brain injury have been poorly defined. In this study, all trauma patients identified over a 12-year period with progression of neurologic injury requiring craniotomy were retrospectively reviewed. METHODS: Data collected included demographic information, mechanism of injury, field and admission vital signs, and Glasgow Coma Scale score, medications, associated injuries, and coagulopathy. Head computed tomographic scans were reviewed, and anatomic findings were correlated with clinical changes (change in mental status or elevation of intracranial pressure) that led to subsequent CT scan and craniotomy. RESULTS: Of 20,100 patients, there were 852 who had computed tomographic scans with acute intracranial injuries on admission; 462 patients were managed nonoperatively. Fifty-seven patients had progression of neurologic injury (34 < 24 hours = early; 23 > 24 hours = late) that required surgery. CONCLUSION: Of the variables investigated, only anatomic location of injury was found to be predictive of early failure of nonoperative management. Frontal intraparenchymal hematomas are particularly prone to early failure. Clinical examination and intracranial pressure monitoring are equally important in detecting failure and should be an integral part of nonoperative management.
Overview of health-related quality-of-life measures for pediatric patients: application in the assessment of pharmacotherapeutic and pharmacoeconomic outcomes. Marra CA, Levine M, McKerrow R, Carleton BC. Pharmacotherapy 1996: 16(5); 879-88. Health-related quality of life (HRQOL) is an important dimension in assessing health care. Several methodologic considerations are related to the manner in which these data are obtained in children. Few multidimensional generic measures of quality of life (QOL) have been developed for children and adolescents. Most published research concerns the development of tools to be used in a disease-specific manner for clinical trials. Although several authors point out numerous advantages in assessing HRQOL in clinical practice, several barriers must be overcome for this to occur. In the current era of economic restraint, HRQOL measures must be integrated into pharmaco-economic analyses to assess fully the impact of a drug on health care resources and outcomes.
Views of practicing physicians and the public on medical errors. Blendon RJ, DesRoches CM, Brodie M, Benson JM, Rosen AB, Schneider E, Altman DE, Zapert K, Herrmann MJ, Steffenson AE. N Engl J Med 2002: 347(24); 1933-40. [Medline] BACKGROUND: In response to the report by the Institute of Medicine on medical errors, national groups have recommended actions to reduce the occurrence of preventable medical errors. What is not known is the level of support for these proposed changes among practicing physicians and the public. METHODS: We conducted parallel national surveys of 831 practicing physicians, who responded to mailed questionnaires, and 1207 members of the public, who were interviewed by telephone after selection with the use of random-digit dialing. Respondents were asked about the causes of and solutions to the problem of preventable medical errors and, on the basis of a clinical vignette, were asked what the consequences of an error should be. RESULTS: Many physicians (35 percent) and members of the public (42 percent) reported errors in their own or a family member's care, but neither group viewed medical errors as one of the most important problems in health care today. A majority of both groups believed that the number of in-hospital deaths due to preventable errors is lower than that reported by the Institute of Medicine. Physicians and the public disagreed on many of the underlying causes of errors and on effective strategies for reducing errors. Neither group believed that moving patients to high-volume centers would be a very effective strategy. The public and many physicians supported the use of sanctions against individual health professionals perceived as responsible for serious errors. CONCLUSIONS: Though substantial proportions of the public and practicing physicians report that they have had personal experience with medical errors, neither group has the sense of urgency expressed by many national organizations. To advance their agenda, national groups need to convince physicians, in particular, that the current proposals for reducing errors will be very effective.
The measurement and monitoring of surgical adverse events. Bruce J, Russell EM, Mollison J, Krukowski ZH. Accessed on 2003-08-15. BACKGROUND: Surgical adverse events contribute significantly to postoperative morbidity, yet the measurement and monitoring of events is often imprecise and of uncertain validity. Given the trend of decreasing length of hospital stay and the increase in use of innovative surgical techniques--particularly minimally invasive and endoscopic procedures--accurate measurement and monitoring of adverse events is crucial. OBJECTIVES: The aim of this methodological review was to identify a selection of common and potentially avoidable surgical adverse events and to assess whether they could be reliably and validly measured, to review methods for monitoring their occurrence and to identify examples of effective monitoring systems for selected events. This review is a comprehensive attempt to examine the quality of the definition, measurement, reporting and monitoring of selected events that are known to cause significant postoperative morbidity and mortality. METHODS - SELECTION OF SURGICAL ADVERSE EVENTS: Four adverse events were selected on the basis of their frequency of occurrence and likelihood of evidence of measurement and monitoring: (1) surgical wound infection; (2) anastomotic leak; (3) deep vein thrombosis (DVT); (4) surgical mortality. Surgical wound infection and DVT are common events that cause significant postoperative morbidity. Anastomotic leak is a less common event, but risk of fatality is associated with delay in recognition, detection and investigation. Surgical mortality was selected because of the effort known to have been invested in developing systems for monitoring surgical death, both in the UK and internationally. Systems for monitoring surgical wound infection were also included in the review. METHODS - LITERATURE SEARCH: Thirty separate, systematic literature searches of core health and biomedical bibliographic databases (MEDLINE, EMBASE, CINAHL, HealthSTAR and the Cochrane Library) were conducted. The reference lists of retrieved articles were reviewed to locate additional articles. A matrix was developed whereby different literature and study designs were reviewed for each of the surgical adverse events. Each article eligible for inclusion was independently reviewed by two assessors. METHODS - CRITICAL APPRAISAL: Studies were appraised according to predetermined assessment criteria. Definitions and grading scales were assessed for: content, criterion and construct validity; repeatability; reproducibility; and practicality (surgical wound infection and anastomotic leak). Monitoring systems for surgical wound infection and surgical mortality were assessed on the following criteria: (1) coverage of the system; (2) whether or not denominator data were collected; (3) whether standard and agreed definitions were used; (4) inclusion of risk adjustment; (5) issues related to data collection; (6) postdischarge surveillance; (7) output in terms of feedback and wider dissemination. RESULTS - SURGICAL WOUND INFECTION: A total of 41 different definitions and 13 grading scales of surgical wound infection were identified from 82 studies. Definitions of surgical wound infection varied from presence of pus to complex definitions such as those proposed by the Centres for Disease Control in the USA. A small body of literature has been published on the content, criterion and construct validity of different definitions, and comparisons have been made against wound assessment scales and multidimensional indices. There are examples of comprehensive hospital-based monitoring systems of surgical wound infection, mainly under the auspices of nosocomial surveillance. To date, however, there is little evidence of systematic measurement and monitoring of surgical wound infection after hospital discharge. RESULTS - ANASTOMOTIC LEAK: Over 40 definitions of anastomotic leak were extracted from 107 studies of upper gastrointestinal, hepatopancreaticobiliary and lower gastrointestinal surgery. No formal evaluations were found that assessed the validity or reliability of definitions or severity scales of anastomotic leak. One definition was proposed during a national consensus workshop, but no evidence of its use was found in the surgical literature. The lack of a single definition or gold standard hampers comparison of postoperative anastomotic leak rates between studies and institutions. RESULTS - DEEP VEIN THROMBOSIS: Although a critical review of the DVT literature could not be completed within the realms of this review, it was evident that a number of new techniques for the detection and diagnosis of DVT have emerged in the last 20 years. The group recommends a separate review be undertaken of the different diagnostic tests to detect DVT. RESULTS - SURGICAL MORTALITY MONITORING SYSTEMS: The definition of surgical mortality is relatively consistent between monitoring systems, but duration of follow-up of death postdischarge varies considerably. The majority of systems report in-hospital mortality rates; only some have the potential to link deaths to national death registers. Risk assessment is an important factor and there should be a distinction between recording pre-intervention factors and postoperative complications. A variety of risk scoring systems was identified in the review. Factors associated with accurate and complete data collection include the employment of local, dedicated personnel, simple and structured prompts to ensure that clinical input is complete, and accurate and automated data capture and transfer. CONCLUSIONS: The use of standardised, valid and reliable definitions is fundamental to the accurate measurement and monitoring of surgical adverse events. This review found inconsistency in the quality of reporting of postoperative adverse events, limiting accurate comparison of rates over time and between institutions. The duration of follow-up for individual events will vary according to their natural history and epidemiology. Although risk-adjusted aggregated rates can act as screening or warning systems for adverse events, attribution of whether events are avoidable or preventable will invariably require further investigation at the level of the individual, unit or department. CONCLUSIONS - RECOMMENDATIONS FOR RESEARCH: (1) A single, standard definition of surgical wound infection is needed so that comparisons over time and between departments and institutions are valid, accurate and useful. Surgeons and other healthcare professionals should consider adopting the 1992 Centers for Disease Control (CDC) definition for superficial incisional, deep incisional and organ/space surgical site infection for hospital monitoring programmes and surgical audits. There is a need for further methodological research into the performance of the CDC definition in the UK setting. (2) There is a need to formally assess the reliability of self-diagnosis of surgical wound infection by patients. (3) There is a need to assess formally the reliability of case ascertainment by infection control staff. (4) Work is needed to create and agree a standard, valid and reliable definition of anastomotic leak which is acceptable to surgeons. (5) A systematic review is needed of the different diagnostic tests for the diagnosis of DVT. (6) The following variables should be considered in any future DVT review: anatomical region (lower limb, upper limb, pelvis); patient presentation (symptomatic, asymptomatic); outcome of diagnostic test (successfully completed, inconclusive, technically inadequate, negative); length of follow-up; cost of test; whether or not serial screening was conducted; and recording of laboratory cut-off values for fibrinogen equivalent units. (7) A critical review is needed of the surgical risk scoring used in monitoring systems. (8) In the absence of automated linkage there is a need to explore the benefits and costs of monitoring in primary care. (9) The growing potential for automated linkage of data from different sources (including primary care, the private sector and death registers) needs to be explored as a means of improving the ascertainment of surgical complications, including death. This linkage needs to be within the terms of data protection, privacy and human rights legislation. (10) A review is needed of the extent of the use and efficiency of routine hospital data versus special collections or voluntary reporting. www.ncchta.org/fullmono/mon522.pdf
Problems for clinical judgement: 4. Surviving in the report card era. Tu JV, Schull MJ, Ferris LE, Hux JE, Redelmeier DA. Cmaj 2001: 164(12); 1709-12. [Medline] [Abstract] [Full text] [PDF] Health care report cards involve comparisons of health care systems, hospitals or clinicians on performance measures. They are going to be an important feature of medical care in Canada in the new millennium as patients demand more information about their medical care. Although many clinicians are aware of this growing trend, they may not be prepared for all of its implications. In this article, we provide some historical background on health care report cards and describe a number of strategies to help clinicians survive and thrive in the report card era. We offer a number of tips ranging from knowing your outcomes first to proactively getting involved in developing report cards.
The caffeine metabolic ratio as an index of xanthine oxidase activity in clinically active and silent celiac patients. Boda M, Nemeth I, Boda D. Journal of Pediatrics Gastroenterology and Nutrition 1999: 29(5); 546-50. [Medline] BACKGROUND: The xanthine oxidoreductase system has been identified as one of the main sources of free radicals responsible for various forms of tissue injury. Because the intestinal villi are an important location of this enzyme, it was of interest to study the role of xanthine oxidase in gluten-sensitive celiac enteropathy, associated with characteristic villous atrophy. Measured by a noninvasive method, the ratio of caffeine metabolites excreted in the urine after a caffeine challenge had previously been shown to be indicative of the total xanthine oxidase activity of the patient. METHODS: The study involved 22 children with gluten-challenged celiac disease, exhibiting subtotal villous atrophy in specimens from the third intestinal biopsy in accordance with ESPGHAN criteria. Ten of the patients displayed overt clinical symptoms (active form), whereas 12 had no symptoms (silent form). Urinary caffeine metabolites were determined by high-pressure liquid chromatography. The total in vivo xanthine oxidase activity was expressed as the caffeine metabolite index. RESULTS: In patients with active celiac disease the xanthine oxidase activity index was considerably higher, whereas in those with silent disease it was significantly lower than the control value. A significant negative correlation was shown between the index indicative of xanthine oxidase activity and the serum iron level of the patients. CONCLUSIONS: Activation of xanthine oxidase may play a role in the pathogenesis of active celiac disease with definite malabsorption, gastrointestinal symptoms, and anemia. The caffeine test reflects the difference in the pathogenetic mechanism leading to the mucosal lesion and clinical symptoms of active and silent forms of celiac disease.
Drug interactions with newer antidepressants: role of human cytochromes P450. Greenblatt DJ, von Moltke LL, Harmatz JS, Shader RI. J Clin Psychiatry 1998: 59(Suppl 15); 19-27. Selective serotonin reuptake inhibitors and related antidepressant compounds have the secondary pharmacologic property of inhibiting the activity of human cytochrome P450 enzymes responsible for the oxidative metabolism of many drugs. A number of clinically important pharmacokinetic drug interactions are a consequence of these cytochrome inhibiting effects. This review evaluates the clinical implications of the metabolic profiles of the newer antidepressants, the relative activities of various new antidepressants as inhibitors of human cytochrome P450, and the various in vivo and in vitro methodologies that can be used for identification and quantification of drug interactions.
Cytochrome P450 Involvement in the biotransformation of cisapride and racemic norcisapride in vitro: differential activity of individual human CYP3A isoforms. Pearce R, RR G, GL K, JS. L. Drug Metab Dispos 2001: 29(12); 1548-1554. Identification of the human cytochrome P450 (P450) enzymes involved in the metabolism of cisapride and racemic norcisapride [(+/-)-norcisapride] was investigated at 0.1 and 1 microM, concentrations that span the mean plasma C(max) for cisapride. Formation of norcisapride (Nor), 3-fluoro-4-hydroxycisapride (3F), and 4-fluoro-2-hydroxycisapride (4F) from cisapride and an uncharacterized metabolite (UNK) from (+/-)-norcisapride in human liver microsomes (HLMs) were consistent with Michaelis-Menten kinetics for a single enzyme (K(m), 6.0, 14.3, 13.9, and 107 microM; V(max), 1350, 696, 568, and 25 pmol/mg of protein, respectively). HLMs converted cisapride to Nor at rates that were at least 3 orders of magnitude greater than those observed for (+/-)-norcisapride conversion to UNK. The sample-to-sample variation in the rates of Nor, 3F, 4F, and UNK formation correlated strongly (r(2) > 0.796) with CYP3A4/5 activity in a panel of HLMs (n = 7) and was markedly reduced by ketoconazole, a potent CYP3A inhibitor. Ketoconazole virtually eliminated (+/-)-norcisapride conversion to UNK (94 +/- 0.5%). Studies with 10 cDNA-expressed enzymes revealed that CYP3A4 catalyzed the formation of Nor and 4F at rates >100 times those of non-CYP3A enzymes and >100- and 50-fold higher than CYP3A5 and CYP3A7, respectively. CYP3A4 was the only P450 capable of UNK formation. Therefore, CYP3A4 is the principal P450 enzyme responsible for the conversion of cisapride to Nor, 3F, and 4F and of (+/-)-norcisapride to UNK. Compared with cisapride, factors related to CYP3A4-mediated (+/-)-norcisapride metabolism (e.g., ontogeny of drug-metabolizing enzymes, inhibition, and induction) should be clinically unimportant due to the apparent lack of dependence on cytochromes P450 for elimination.
Cytochrome P450 2D6 variants in a Caucasian population: allele frequencies and phenotypic consequences. Sachse C, Brockmoller J, Bauer S, Roots I. American Journal of Human Genetics 1997: 60(2); 284-95. [Medline] Cytochrome P450 2D6 (CYP2D6) metabolizes many important drugs. CYP2D6 activity ranges from complete deficiency to ultrafast metabolism, depending on at least 16 different known alleles. Their frequencies were determined in 589 unrelated German volunteers and correlated with enzyme activity measured by phenotyping with dextromethorphan or debrisoquine. For genotyping, nested PCR-RFLP tests from a PCR amplificate of the entire CYP2D6 gene were developed. The frequency of the CYP2D6*1 allele coding for extensive metabolizer (EM) phenotype was.364. The alleles coding for slightly (CYP2D6*2) or moderately (*9 and *10) reduced activity (intermediate metabolizer phenotype [IM]) showed frequencies of.324.018, and.015, respectively. By use of novel PCR tests for discrimination, CYP2D6 gene duplication alleles were found with frequencies of.005 (*1x2).013 (*2x2), and.001 (*4x2). Frequencies of alleles with complete deficiency (poor metabolizer phenotype [PM]) were.207 (*4).020 (*3 and *5).009 (*6), and.001 (*7, *15, and *16). The defective CYP2D6 alleles *8, *11, *12, *13, and *14 were not found. All 41 PMs (7.0%) in this sample were explained by five mutations detected by four PCR-RFLP tests, which may suffice, together with the gene duplication test, for clinical prediction of CYP2D6 capacity. Three novel variants of known CYP2D6 alleles were discovered: *1C (T1957C), *2B (additional C2558T), and *4E (additional C2938T). Analysis of variance showed significant differences in enzymatic activity measured by the dextromethorphan metabolic ratio (MR) between carriers of EM/PM (mean MR =.006) and IM/PM (mean MR =.014) alleles and between carriers of one (mean MR =.009) and two (mean MR =.003) functional alleles. The results of this study provide a solid basis for prediction of CYP2D6 capacity, as required in drug research and routine drug treatment.
Developmental expression of CYP2C and CYP2C-dependent activities in the human liver: in-vivo/in-vitro correlation and inducibility. Treluyer JM, Gueret G, Cheron G, Sonnier M, Cresteil T. Pharmacogenetics 1997: 7(6); 441-52. Experiments were performed in vivo and in vitro to date the onset of hepatic CYP2C isoforms and CYP2C-dependent activities during the perinatal period in humans. Proteins were not detected by immunoblotting in fetal livers and developed in the first few weeks after birth, irrespective of the gestational age at birth. Similarly, the hydroxylation of tolbutamide, a marker for CYP2C9 was undetected in fetal liver microsomes and rose in the first month after birth. In adult liver preparations, the hydroxylation of diazepam correlated well with the CYP3 A content of microsomes (r = 0.858, p < 0.01) and with the 6 beta hydroxylation of testosterone (r = 0.830, p < 0.005), whereas demethylation was related to the bulk of CYP2C proteins (r = 0.865, p < 0.005). In fetal liver microsomes, hydroxylation and demethylation activities accounted for less than 5% of the adult activities and both increased immediately after birth to reach adult activities at 1 year of age. When diazepam was given for sedative purpose in neonates and infants, the in-vivo urinary excretion of desmethyl diazepam, temazepam and oxazepam was extremely low in 1-2 day newborns (less than 5 nmol metabolites excreted in 24 h per kg body weight) and developed in the first week after birth. In newborns, barbiturates and to a lesser extent steroids, acted as inducers of CYP2C isoforms and increased tolbutamide hydroxylation, diazepam demethylation and diazepam hydroxylation by 2 to 10-fold. The surge of CYP2C proteins was caused by an accumulation of RNAs occurring in the first week after birth. The hepatic content in CYP2C8, 2C9 and 2C18 RNA displayed the same profile of evolution, which suggested a coregulation of their synthesis during the neonatal period. Taken together, these biochemical and clinical data enable dating of the onset of CYP2C proteins to the first weeks after birth, which is of considerable clinical importance in pediatric pharmacology.
Medical Genetics: 2. The Diagnostic Approach to the Child with Dysmorphic Signs. Hunter AGW. Canadian Medical Association 2002: 166((4)); 367-372. [Medline] [Abstract] [Full text] [PDF] Dysmorphology is the branch of clinical genetics in which clinicians and researchers study and attempt to interpret the patterns of human growth and structural defects. Reaching an accurate diagnosis for children with dysmorphic signs is important to their families, because it makes available all the accumulated knowledge about the relevant condition any may provide the family withthe opportunity for interaction with patient or parent support groupw. I show in this review that reaching a diagnosis in dysmorphology involves an apparoach that is not fundamentally different from that of other medical discipolines. Cytogenetic and molecular techniques continue to improve our ability to make precise syndrome diagnoses; however, these tests are expensive and should be used selectively.
Evidence-based disease management. Ellrodt G, Cook DJ, Lee J, Cho M, Hunt D, Weingarten S. Jama 1997: 278(20); 1687-92. [Medline] Disease management is an approach to patient care that emphasizes coordinated, comprehensive care along the continuum of disease and across health care delivery systems. Evidence-based medicine is an approach to practice and teaching that integrates pathophysiological rationale, caregiver experience, and patient preferences with valid and current clinical research evidence. Using diabetes mellitus as an example, we describe the importance of evidence-based medicine to the development of disease management programs. We present a method for developing and implementing evidence-based clinical guidelines, clinical pathways, and algorithms and describe the creation of systems to measure and report processes and outcomes that could drive quality improvement in diabetes care. Multidisciplinary teams are ideally suited to develop, lead, and implement evidence-based disease management programs, since they play an essential role in the preventive, diagnostic, and therapeutic decisions for patients with diabetes throughout the course of their disease.
Reducing medication errors: potential benefits of bolus thrombolytic agents. Richards CF, Cannon CP. Acad Emerg Med 2000: 7(11); 1285-9. [Medline] A recent Institute of Medicine report highlighted the high incidence of medical errors in clinical practice, and the important fact that errors are associated with increased mortality. The administration of thrombolytic therapy for acute myocardial infarction is a particularly high-risk situation for emergency physicians. The combination of extreme time pressure with a narrow "therapeutic window" increases the potential for adverse outcomes due to dosing errors. Numerous trials have found that the dose of thrombolytic therapy is closely related to outcomes, with too low a dose associated with lower rates of infarct-related artery patency and higher doses associated with increased bleeding and intracranial hemorrhage. In the GUSTO-I trial, 13.5% of patients treated with streptokinase and 11.5% of patients treated with tissue plasminogen activator (t-PA) had a medication error (i.e., incorrect dose or infusion length). Most importantly, 30-day mortality was significantly higher in patients with medication errors: for t-PA dosing errors mortality was 7.7% vs 5.5% for patients who received the correct t-PA dose (p < 0.001), with similar findings for streptokinase. More recent data from the InTIME2 trial and other studies showed that use of a bolus thrombolytic agent reduced the rate of medication errors. Thus, use of the simpler bolus thrombolytic agents may reduce emergency department medication errors, and thus improve overall clinical outcome.
Chronic asthma and chiropractic spinal manipulation: a randomized clinical trial. Nielsen N, Bronfort G, Bendix T, Madsen F, Weeke B. Clin Exp Allergy 1995: 25(1); 80-8. [Medline] The purpose of this randomized patient- and observer-blinded cross-over trial was to evaluate the efficacy of chiropractic treatment in the management of chronic asthma when combined with pharmaceutical maintenance therapy. The trial was conducted at the National University Hospital's Out-patient Clinic in Copenhagen, Denmark. Thirty-one patients aged 18-44 years participated, all suffering from chronic asthma controlled by bronchodilators and/or inhaled steroids. Patients, or who had received chiropractic treatment for asthma within the last 5 years, who received oral steroids and immunotherapy, were not eligible. Patients were randomized to receive either active chiropractic spinal manipulative treatment or sham chiropractic spinal manipulative treatment twice weekly for 4 weeks, and then crossed over to the alternative treatment for another 4 weeks. Both phases were preceded and followed by a 2-week period without chiropractic treatment. The main outcome measurements were forced expiratory volume in the first second (FEV1), forced vital capacity (FVC), daily use of inhaled bronchodilators, patient-rated asthma severity and non-specific bronchial reactivity (n-BR). Using the cross-over analysis, no clinically important or statistically significant differences were found between the active and sham chiropractic interventions on any of the main or secondary outcome measures. Objective lung function did not change during the study, but over the course of the study, non-specific bronchial hyperreactivity (n-BR) improved by 36% (P = 0.01) and patient-rated asthma severity decreased by 34% (P = 0.0002) compared with the baseline values.(ABSTRACT TRUNCATED AT 250 WORDS)
Recent Advances: Complementary medicine. Vickers A. BMJ 2000: 321; 683-686. [Medline] [Full text] [PDF]
Results of the national cooperative inner-city asthma study (NCICAS) environmental intervention to reduce cockroach allergen exp