Statistical Evidence. Material from/for various sections of the book
Some topics that I might want to discuss in future web pages and/or in a second edition of the book.
What sample size is needed to allow randomization to prevent covariate imbalance with high probability?
Describe patient preference trials: http://bmj.bmjjournals.com/cgi/content/full/316/7128/360
Review Observational research methods. Research design II: cohort, cross sectional, and case-control studies. Mann CJ. Emerg Med J 2003: 20(1); 54-60. [Medline] [Abstract] [Full text] [PDF]
Attrition and bias in the MRC cognitive function and ageing study: an epidemiological investigation. Matthews FE, Chatfield M, Freeman C, McCracken C, Brayne C. BMC Public Health 2004: 4(1); 12. [Medline] [Abstract] [Full text] [PDF]
The Oslo Health Study: The impact of self-selection in a large, population-based survey. Sogaard AJ, Selmer R, Bjertness E, Thelle D. Int J Equity Health 2004: 3(1); 3. [Medline] [Abstract] [Full text] [PDF]
The requirement for prior consent to participate on survey response rates: a population-based survey in Grampian. Angus VC, Entwistle VA, Emslie MJ, Walker KA, Andrew JE. BMC Health Serv Res 2003: 3(1); 21. [Medline] [Abstract] [Full text] [PDF]
The fallacy of enrolling only high-risk subjects in cancer prevention trials: is there a "free lunch"? Baker SG, Kramer BS, Corle D. BMC Med Res Methodol 2004: 4(1); 24. [Medline] [Abstract] [Full text] [PDF]
Discuss particularization. Finasteride in the treatment of clinical benign prostatic hyperplasia: a systematic review of randomised trials. Edwards JE, Moore RA. BMC Urol 2002: 2(1); 14. [Medline] [Abstract] [Full text] [PDF]
Discuss Jurendini 2004 reference
http://www.clinicalmolecularallergy.com/content/3/1/2
http://arthritis-research.com/content/7/2/R333
http://www.biomedcentral.com/1471-2318/5/2/abstract
http://ccforum.com/content/9/2/R83/abstract
http://www.biomedcentral.com/1472-6963/5/4/abstract
http://www.biomedcentral.com/1471-2261/5/2/abstract
http://ccforum.com/content/9/2/R74/abstract
http://www.reproductive-health-journal.com/content/2/1/1/abstract
http://ccforum.com/content/9/2/R60
http://breast-cancer-research.com/content/7/2/R184/abstract
http://www.translational-medicine.com/content/2/1/46/abstract
http://www.biomedcentral.com/1471-2431/4/24/abstract
http://www.rbej.com/content/2/1/82/abstract
http://www.biomedcentral.com/1471-230X/4/32/abstract
http://bmj.bmjjournals.com/cgi/content/full/318/7194/1324
http://www.cmaj.ca/cgi/content/full/168/7/835
http://bmj.bmjjournals.com/cgi/content/full/327/7424/1159
http://bmj.bmjjournals.com/cgi/content/full/317/7155/405
http://bmj.bmjjournals.com/cgi/content/full/325/7358/269
How well is the clinical importance of study results reported? An assessment of randomized controlled trials. Chan KB, Man-Son-Hing M, Molnar FJ, Laupacis A. Cmaj 2001: 165(9); 1197-202. [Abstract] [Full text] [PDF] BACKGROUND: The interpretation of the results of randomized controlled trials (RCTs) has traditionally emphasized statistical significance rather than clinical importance. Our aim was to assess the quality of reporting of factors related to clinical importance in a sample of published RCTs. METHODS: A random sample of 27 (of a total of 266) RCTs published in 5 major medical journals over a 1-year period were reviewed by 4 independent reviewers for factors considered important in the interpretation of the clinical importance of study results: identification of a clearly defined primary outcome, reporting of the expected difference between groups used in the calculation of sample size (the delta value) and whether it was based on the minimal clinically important difference of the intervention, the statistical significance of the results, presentation of pertinent confidence intervals, and the authors' interpretation of the clinical importance of the results. RESULTS: Twenty-two of 27 (81%) articles explicitly reported a single primary outcome. Of the 20 articles that included a sample size calculation, 18 (90%) reported a delta value. Two of the 18 (11%) articles explicitly stated that the delta value was chosen to reflect the minimal clinically important difference of the intervention. For the primary outcomes, confidence intervals surrounding the point estimates of the efficacy of the interventions were reported in 11 of 27 (41%) studies. The study results were interpreted from the perspective of clinical importance in 20 of 27 (74%) of the articles. Of these 20 reports, 5 (25%) provided justification for their clinical interpretation of the results. INTERPRETATION: Authors of RCTs published in major general medical and internal medicine journals do not consistently provide their own interpretation of the clinical importance of their results, and they often do not provide sufficient information to allow readers to make their own interpretation.
Audit and feedback: effects on professional practice and health care outcomes. Thomson OB, Oxman AD, Davis DA, Haynes RB, Freemantle N, Harvey EL. Cochrane 2000: (2); CD000259. [Medline] BACKGROUND: Audit and feedback has been identified as having the potential to change the practice of health care professionals. OBJECTIVES: To assess the effects of audit and feedback on the practice of health professionals and patient outcomes. SEARCH STRATEGY: We searched MEDLINE up to June 1997, the Research and Development Resource Base in Continuing Medical Education, and reference lists of related systematic reviews and articles. SELECTION CRITERIA: Randomised trials of audit and feedback (defined as any summary of clinical performance of health care over a specified period of time). The participants were health care professionals responsible for patient care. DATA COLLECTION AND ANALYSIS: Two reviewers independently extracted data and assessed study quality. MAIN RESULTS: Thirty-seven studies were included, involving more than 4977 physicians. The reporting of study methods was inadequate for almost all studies. In 31 out of 37 studies the randomisation process could not be determined. Information regarding data analysis was also lacking. For example, power calculations were not mentioned in 27 out of 37 studies. A variety of behaviours were targeted including the reduction of diagnostic test ordering, prescribing practices, preventive care, and the general management of a problem, for example hypertension. Twenty-eight studies measured physician performance, one study targeted patient outcomes in diabetes and the remaining eight studies measured both physician performance and patient outcomes. The relative percentage differences ranged from -16% to 152%. The clinical importance of the changes was not always clear. REVIEWER'S CONCLUSIONS: Audit and feedback can sometimes be effective in improving the practice of health care professionals, in particular prescribing and diagnostic test ordering. When it is effective, the effects appear to be small to moderate but potentially worthwhile. Those attempting to enhance professional behaviour should not rely solely on this approach.
Interpreting thresholds for a clinically significant change in health status in asthma and COPD. Jones PW. Eur Respir J 2002: 19(3); 398-404. [Medline] Health status (or Health-Related Quality of Life) measurement is an established method for assessing the overall efficacy of treatments for asthma and chronic obstructive pulmonary disease (COPD). Such measurements can indicate the potential clinical significance of a treatment's effect. This paper is concerned with methods of estimating the threshold of clinical significance for three widely used health status questionnaires for asthma and COPD: the Asthma Quality of Life Questionnaire, Chronic Respiratory Questionnaire and St George's Respiratory Questionnaire. It discusses the methodology used to obtain such estimates and shows that the estimates appear to be fairly reliable; ie. for a given questionnaire, similar estimates may be obtained in different studies. These empirically derived thresholds are all mean estimates with confidence intervals around them. The presence of these confidence intervals affects the way in which the thresholds may be used to draw inferences concerning the clinical relevance of clinical trial results. A new system of judging the magnitude of clinically significant results is proposed. Finally, an attempt is made to translate these thresholds into scenarios that illustrate what a clinically significant change with treatment may mean to an individual patient.
Recombinant or urinary follicle-stimulating hormone? A cost-effectiveness analysis derived by particularizing the number needed to treat from a published meta-analysis. Ola B, Papaioannou S, Afnan MA, Hammadieh N, Gimba S. Fertil Steril 2001: 75(6); 1106-10. [Medline] OBJECTIVE: To demonstrate that particularizing pooled results of a meta-analysis can derive incremental cost effectiveness of superovulation with recombinant follicle-stimulating hormones (rFSH) vs. the highly purified urinary form (uFSH) for assisted conception. DESIGN: A retrospective study. SETTING: An assisted conception unit in the United Kingdom. PATIENT(S): One hundred forty-five fresh in vitro fertilization (IVF) and 58 fresh intracytoplasmic sperm injection (ICSI) cycles. INTERVENTION(S): rFSH vs. uFSH. MAIN OUTCOME MEASURE(S): Incremental cost-effectiveness (i.e., cost needed to treat, or CNT) and budget-impact analyses of rFSH vs. uFSH. RESULT(S): In women less than 30 years old, the clinical pregnancy rate was 37.7% (95% CI 24.8%-52.1%), the particularized number needed to treat (pNNT) was -19, and the cost needed to treat was 5070.51 pounds sterling (3660.53 pounds sterling to 7619.32 pounds sterling). For the 30- to 35-year-old age group, the clinical pregnancy rate was 29.9% (95% CI 20.0%--41.4%), the particularized number needed to treat was -24, and CNT was 7335.59 pounds stering (5284.11 pounds sterling to 10,941.22 pounds sterling). For the 36- to 40-year-old age group, the clinical pregnancy rate was 30.6.0% (95% CI 19.6%--43.7%), the particularized number needed to treat was -23.0, and the CNT was 8569.67 pounds sterling (5998.70 pounds sterling to 13,413.24 pounds sterling). CONCLUSION(S): The CNT and thus the budget impact analyses (the extra number of cycles that can be funded by the CNT) both increase directly with age of the patient, and inversely with the clinical pregnancy rate.
Making consent patient centred. Bridson J, Hammond C, Leach A, Chester MR. BMJ 2003: 327(7424); 1159-1161. [Full text] [PDF]
Changes in clinical trials mandated by the advent of meta-analysis. Chalmers TC, Lau J. Stat Med 1996: 15(12); 1263-8; discussion 1269-72. [Medline] Service on the Data Monitoring Committee of the CPEP (Calcium for Pre-eclampsia Prevention) has led us to four conclusions about clinical trials which we should like to present to this gathering of biostatisticians for their reactions: (i) meta-analyses of the pertinent published trials of the same therapy should always be undertaken before the start of a new trial, and the results examined to help determine the design of a new trial or determine if a trial should be undertaken at all; (ii) assuming that a decision is made to go ahead, the results of the past trials should be used in sizing the new one; (iii) in the course of the new one, regardless of the size estimates, stopping early should be considered if the trends conform to the results of the meta-analysis; and (iv) heterogeneity of patients entering clinical trials is desirable and should be specifically studied, and it should never be concluded that an average outcome is applicable to all future patients.
Applying the results of trials and systematic reviews to individual patients. Glasziou P, Guyatt GH, Dans AL, Dans LF, Straus S, Sackett DL. ACP Journal Club 1998: 129(3); A15-6. [Medline] Your patient is a 60-year-old hypertensive, alcoholic woman whose symptomless atrial fibrillation was first documented 3 months ago. An echocardiogram shows an enlarged left atrium, rendering successful cardioversion unlikely. She tells you that both of her parents had severe strokes that made the last years of their lives horrible, and she is terrified of having a stroke. You know that a meta-analysis of 5 randomized trials of warfarin in nonvalvular atrial fibrillation demonstrated a 68% relative risk reduction (RRR) in stroke (1). You consider prescribing warfarin for this patient but know that she would not have qualified for the study because alcoholism increases her risk for major hemorrhage (2).
Can treatment that is helpful on average be harmful to some patients? A study of the conflicting information needs of clinical inquiry and drug regulation. Horwitz RI, Singer BH, Makuch RW, Viscoli CM. J Clin Epidemiol 1996: 49(4); 395-400. [Medline] Randomized controlled trials are conducted with heterogeneous groups of patients, and the trial results represent an estimate of the average difference in the responses of the treatment groups. Clinicians, however, engage in a process of clinical inquiry, assembling data that will allow an assessment of the appropriate choice of treatment according to more narrowly defined clinical features. We describe a method of clinical inquiry within RCTs that can enhance the applicability of results to clinical decision making. Our methods included the use of data from the Beta-Blocker Heart Attack Trial, which enrolled 3837 subjects in 31 clinical centers. The 31 centers were divided into 21 dominant centers (mortality rates higher for placebo than propranolol) and 10 divergent centers (higher mortality rates for patients randomized to propranolol). Overall, compared to placebo, propranolol reduced the risk of dying for the "average" patient from 9.8 to 7.2%. Results for patients in dominant centers (RR = 0.50) were significantly different from those in divergent centers (RR = 1.33). We identified two cotherapies--aspirin use and coronary artery surgery--that subsequently affected the benefits of propranolol in divergent centers. For patients in divergent centers, propranolol reduced the risk of dying for patients treated with aspirin and/or coronary surgery (RR = 0.39), but not for patients not receiving these therapies (RR = 1.42). We conclude that differences in results across centers of a multicenter RCT may reflect important distinctions in the clinical conditions of enrolled subjects. These distinctions help to identify subgroups of patients in which treatment that has an average overall benefit may be harmful for some patients.
Decision analysis and the implementation of research findings. Lilford RJ, Pauker SG, Braunholtz DA, Chard J. British Medical Journal 1998: 317(7155); 405-9. [Medline] [Full text]
Variation in patient utilities for outcomes of the management of chronic stable angina. Implications for clinical practice guidelines. Ischemic Heart Disease Patient Outcomes Research Team. Nease RF, Kneeland T, O'Connor GT, Sumner W, Lumpkins C, Shaw L, Pryor D, Sox HC. Jama 1995: 273(15); 1185-90. [Medline] OBJECTIVE--Although practice guidelines sometimes make recommendations based on symptom severity, they rarely account for how patients feel about their symptoms. To investigate the possible importance of patient preferences in treatment of ischemic heart disease, we assessed attitudes toward symptoms in patients with angina pectoris. DESIGN--Case series. SETTING--Ambulatory cardiology clinics at two tertiary care medical centers. PATIENTS--A total of 220 subjects were selected from 589 patients with chronic stable angina referred from cardiologists to achieve patients samples balanced for sex, race, and angina severity. MAIN OUTCOME MEASURES--We measured patients' attitudes toward their angina using the rating scale, time trade-off, and standard gamble utility metrics. Reliability of measurements was evaluated by repeating the assessments 2 weeks later on 50 willing patients. RESULTS--While the mean responses followed the expected patterns (those with more severe Canadian Cardiovascular Society scores chose lower utilities), attitudes toward symptoms varied substantially among patients with similarly severe angina. For example, there was a 33% chance that a patient with class II angina had a time trade-off utility that was lower (ie, more bothered by symptoms) than a patient with more severe angina (class III/IV). This variation in utilities was not due to random error in the assessments. CONCLUSIONS--Angina patients with similar functional limitation vary considerably in their tolerance for their symptoms, as measured by utilities. Our findings suggest that guidelines for the management of ischemic heart disease should be based on the preferences of the individual patient rather than on symptom severity alone.
Pronouncements about the need for "generalizability" of randomized controlled trial results are humbug. Sackett DL. Control. Clinical Trials 2000: 21; 82S. Abstract not available.
Conflicting clinical trials and the uncertainty of treating mild hypertension. Toth PJ, Horwitz RI. Am J Med 1983: 75(3); 482-8. [Medline] Recommendations to treat patients with mild hypertension are based principally on six randomized clinical trials conducted in three countries between 1964 and 1979. To determine whether the methods and results of these randomized clinical trials justify the current therapeutic policy, a clinical epidemiologic analysis of the data was performed focusing on (1) clinical versus statistical significance, (2) clinical heterogeneity of patients' baseline state, (3) suitable management of the untreated control patients, and (4) choice of outcome events. This analysis suggested that the results of available studies are better suited to public health decisions (number of cardiovascular deaths prevented nationwide) than personal health decisions (whether treatment does more good than harm for individual patients), and that current evidence does not justify a uniform policy of treating all asymptomatic patients with mild hypertension.
The visual analog scale for pain: clinical significance in postoperative patients. Bodian CA, Freedman G, Hossain S, Eisenkraft JB, Beilin Y. Anesthesiology 2001: 95(6); 1356-61. [Medline] BACKGROUND: The visual analog scale is widely used in research studies, but its connection with clinical experience outside the research setting and the best way to administer the VAS forms are not well established. This study defines changes in dosing of intravenous patient-controlled analgesia as a clinically relevant outcome and compares it with VAS measures of postoperative pain. METHODS: Visual analog scale measurements were obtained from 150 patients on the morning after intraabdominal surgery. On the same afternoon, 50 of the patients provided a VAS score on the same form used in the morning, 50 on a new form, and 50 were not asked for a second VAS measurement. RESULTS: Visual analog scale values and changes in value were similar for patients who were given a new VAS form in the afternoon and those who used the form that showed the morning value. The proportions of patients requesting additional analgesia were 4, 43, and 80%, corresponding to afternoon VAS scores of 30 or less, 31-70, and greater than 70, respectively. Change from morning VAS score had no apparent influence on patient-controlled analgesic dosing for patients with afternoon values of 30 or less or greater than 70, but changes in VAS scores of at least 10 did discriminate among patients whose afternoon values were between 31 and 70. CONCLUSIONS: When pain is an outcome measure in research studies, grouping final VAS scores into a small number of categories provides greater clinical relevance for comparisons than using the full spectrum of measured values or changes in value. Seeing an earlier VAS form has no apparent influence on later values.
Determining the minimum clinically significant difference in visual analog pain score for children. Powell CV, Kelly AM, Williams A. Ann Emerg Med 2001: 37(1); 28-31. [Medline] STUDY OBJECTIVE: We sought to determine the minimum clinically significant difference in visual analog scale (VAS) pain score for children. METHODS: We performed a prospective, single-group, repeated-measures study of children between 8 and 15 years presenting to an urban pediatric emergency department with acute pain. On presentation to the ED, patients marked the level of their pain on a 100-mm nonhatched VAS scale. At 20-minute intervals thereafter, they were asked to give a verbal categoric rating of their pain as "heaps better," "a bit better," "much the same," "a bit worse," or "heaps worse" and to mark the level of pain on a VAS scale of the same type as used previously. A maximum of 3 comparisons was recorded for each child. The minimum clinically significant difference in VAS pain score was defined as the mean difference between current and preceding scores when the subject reported "a bit worse" or "a bit better" pain. RESULTS: Seventy-three children were enrolled in the study, yielding 103 evaluable comparisons in which pain was rated as "a bit better" or "a bit worse." The minimum clinically significant difference in VAS score was 10 mm (95% confidence interval 7 to 12 mm). CONCLUSION: This study found the minimum clinically significant difference in VAS pain score for children aged 8 to 15 years (on a 100-mm VAS scale) to be 10 mm (95% confidence interval 7 to 12 mm). In studies of populations, differences of less than this amount, even if statistically significant, are unlikely to be of clinical significance.
Clinically significant changes in pain along the visual analog scale. Bird SB, Dickson EW. Ann Emerg Med 2001: 38(6); 639-43. [Medline] STUDY OBJECTIVE: We sought to test the hypothesis that the change in visual analog scale (VAS) associated with a clinically significant change in pain is related to the initial VAS score. METHODS: A convenience sample of adults with isolated extremity trauma was enrolled. A VAS score was obtained on entry into the study. Descriptions of change in pain ("lot less," "little less," "about the same," "little more," or "lot more") and VAS scores were then obtained every 30 minutes until the patient was free of pain or discharged or a total of 2 hours had passed. Patients were divided into 3 cohorts on the basis of the initial VAS score: VAS score of less than 34, VAS score of 34 to 66, and VAS score of 67 or greater. The absolute values of VAS changes associated with pain descriptions of a "little less" or "little more" (defined as clinically significant), "about the same" (defined as clinically insignificant), and "lot less" or "lot more" were calculated. RESULTS: The change in VAS associated with clinically significant changes in pain in the cohort with VAS scores of less than 34 was 13+/-14 (mean+/-SD), which was significantly lower than that of the cohort with VAS scores of 67 or greater (28+/-21). There was no statistically significant difference in clinically significant changes in pain between the middle cohort and either the upper or lower cohorts (P =.07 and P =.29, respectively). There was no significant change in VAS for clinically insignificant changes in pain among the 3 cohorts (3+/-4, 6+/-6, and 8+/-16, respectively). CONCLUSION: Patients with greater pain require a greater change in VAS score to achieve clinically significant pain relief.
Does the clinically significant difference in visual analog scale pain scores vary with gender, age, or cause of pain? Kelly AM. Acad Emerg Med 1998: 5(11); 1086-90. [Medline] OBJECTIVES: To determine the minimum clinically significant difference in visual analog scale (VAS) pain scores for acute pain in the ED setting and to determine whether this difference varies with gender, age, or cause of pain. METHODS: A prospective, descriptive study of 152 adult patients presenting to the ED with acute pain. At presentation and at 20-minute intervals to a maximum of three measurements, patients marked the level of their pain on a 100-mm, nonhatched VAS. At each follow-up they also gave a verbal rating of their pain as "a lot better," "much the same," "a little worse," or "much worse." The minimum clinically significant difference in VAS pain scores was defined as the mean difference between current and preceding scores when pain was reported as a little worse or a little better. Data were compared based on gender, age more than or less than 50 years, and traumatic vs nontraumatic causes of pain. RESULTS: The minimum clinically significant difference in VAS pain scores is 9 mm (95% CI, 6 to 13 mm). There is no statistically significant difference between the minimum clinically significant differences in VAS pain scores based on gender (p=0.172), age (p=0.782), or cause of pain (p=0.84). CONCLUSIONS: The minimum clinically significant difference in VAS pain scores was found to be 9 mm. Differences of less than this amount, even if statistically significant, are unlikely to be of clinical significance. No significant difference in minimum significant VAS scores was found between gender, age, and cause-of-pain groups.
A proposal to use confidence intervals for visual analog scale data for pain measurement to determine clinical significance. Mantha S, Thisted R, Foss J, Ellis JE, Roizen MF. Anesth Analg 1993: 77(5); 1041-7. [Medline] Visual analog scales (VAS) ranging from 0 cm (no pain) to 10 cm (worst imaginable pain) are used widely for pain measurement, but various investigators have not treated these data consistently. Conventional statistical tests of such data, although evaluating the "statistical significance" may obscure the clinical value of a treatment. On the other hand, confidence intervals (CIs) can illuminate both statistical and clinical importance. CIs give a range of values based on the observed data which contain, with a specified probability, a true but unknown variable typifying a population. We reviewed 112 articles published recently in anesthesia journals for statistical reporting of VAS data. Of the 112 articles, only two used CIs to report mean pain scores and one used CIs to report differences in median pain scores between the study groups. Only two articles presented 95% CI for the mean pain scores graphically. Analgesic techniques that produce VAS values in the range of 0-3 have been reported to represent adequate analgesia. A graphical method using CIs is proposed that allows ready interpretation of VAS data. With this approach, one evaluates whether the 95% CI for the mean pain score in a group during a particular period lies entirely within the zone defined as "analgesic success" (0-3). Such an analysis allows a visual assessment of whether a particular technique would produce clinically important effects in the population at large. This approach seems to provide more information than the use of conventional hypothesis testing in the interpretation of VAS data for pain measurement.
A randomized controlled trial of fentanyl for abortion pain. Rawling MJ, Wiebe ER. Am J Obstet Gynecol 2001: 185(1); 103-7. [Medline] OBJECTIVE: Our aim was to find out whether intravenous fentanyl was effective in reducing the pain of first-trimester abortion. STUDY DESIGN: This randomized controlled trial included 825 women attending a nonhospital abortion facility. Some women chose standard care. Women who did not choose standard care were randomly assigned to receive either 50 to 100 microg of fentanyl, a placebo, or no intervention. With SAS software and a mixed effects analysis of variance model with covariates, we compared mean pain scores of the fentanyl and placebo groups to detect a difference of at least 1 point on an 11-point pain scale. RESULTS: The mean pain score of the fentanyl group was 1.0 point less than that of the placebo group (95% confidence interval, 3.7-4.3) and 0.9 point less than that of the observational group (95% confidence interval, 4.7-5.1). This pain reduction was statistically significant, but the women who were studied wanted a 2-point reduction from fentanyl. CONCLUSION: Fentanyl, when compared with the placebo, reduced abortion pain by 1.0 point on an 11-point scale. This reduction was of questionable clinical significance and was less than desired by the women included in the study.
Clinical utility and clinical significance in the assessment and management of pain in vulnerable infants. Stevens B, Gibbins S. Clin Perinatol 2002: 29(3); 459-68. [Medline] Pain in vulnerable populations unable to provide verbal report is challenging in terms of measurement and treatment. Clinicians strive to provide the best possible pain management for infants in the NICU, yet they are often hindered due to paucity of measures that are not only reliable and valid but also clinically useful. Clinical utility of measures is difficult to establish due to a lack of consistent definition of the construct, varied methods of determination, and the secondary importance afforded to this issue in relation to the establishment of reliability and utility. Without clinically useful pain measures, however, clinicians are unable and unlikely to assess the infant's pain or the effectiveness of pain-relieving interventions. Furthermore, even when the clinician is able to assess pain using a valid measure with a minimum of time, cost, and instruction, the clinical significance of any reduction in pain scores needs to be interpreted in terms of the infant and his/her care provider. The issue of defining the extent of change in pain scores that is clinically significant or important remains unclear. Clarity will involve assigning meaning to particular changes in pain scores for vulnerable infants across a broad array of situations and severities of pain. Although research on this topic in children and adults provides some guidance to this dilemma, only through innovative and creative methods will we be able to address these issues.
Minimum clinically significant VAS differences for simultaneous (paired) interval serial pain assessments. Yamamoto LG, Nomura JT, Sato RL, Ahern RM, Snow JL, Kuwaye TT. Am J Emerg Med 2003: 21(3); 176-9. [Medline] [Abstract] We conducted two studies to determine whether the minimum clinically significant difference in the visual analog scale (VAS) for nearly simultaneous and brief-interval serial assessments of pain is less than that for pain assessment at 20- to 30-minute intervals, using a 10-cm VAS. The first study was a blinded, randomized, placebo-controlled paired trial comparing the pain of intravenous cannulation in both hands (20-minute application of a eutectic mixture of local anesthetics v placebo) of study subjects. The second study was a non-blinded, randomized, paired trial of different treatments for jellyfish stings. In the first study, 37 of 40 subjects indicated that one hand experienced more pain than the other. Eleven of these 37 subjects (30%) indicated differences in VAS values of 1.0 cm or less, with a minimum value of 0.5 cm. In the second study, for all the VAS-based pain comparisons, VAS differences of </=0.5 cm (other than zero) occurred 183 times, and in 171 of these instances (93%) subjects were able to recognize that there was a difference. On the basis of these findings, the minimum clinically significant VAS difference for paired comparisons that are simultaneous or occur within 5 minutes of each other is about 0.5 cm or less. This value is less than the 1.3-cm value determined for serial 20- to 30-minute pain comparisons. It is likely that other types of pain comparisons may have different minimum clinically significant VAS differences.
Is it clinically significant? Erill S. Lancet 2002: 359(9318); 1708. [Full text] [PDF] [Excerpt] We worship statistics. No wonder. After all, the aim of clinical research seems increasingly to be centred on the detection of small differences and what seems to count is the statistical significance of what is found. A paper well seasoned with probability values lower than the customary 5% might seem respectable whatever the size of the difference detected or even the relevance of the variable under study. So much for statistical significance. It is well served, but what we need are tests of clinical significance and, believe me, they exist even if they are not usually found in textbooks.
Efficacy, safety, and cost of new anticancer drugs. Garattini S, Bertele V. British Medical Journal 2002: 325(7358); 269-71. [Medline] [Full text] [PDF]
Clinical versus statistical considerations in the design and analysis of clinical research. Horwitz RI, Singer BH, Makuch RW, Viscoli CM. J Clin Epidemiol 1998: 51(4); 305-7. [Medline]
Why randomized controlled trials fail but needn't: 2. Failure to employ physiological statistics, or the only formula a clinician-trialist is ever likely to need (or understand!). Sackett DL. Cmaj 2001: 165(9); 1226-37. [Medline] [Full text] [PDF]
Negative results of randomized clinical trials published in the surgical literature: equivalency or error? Dimick JB, Diener-West M, Lipsett PA. Arch Surg 2001: 136(7); 796-800. [Medline] HYPOTHESIS: We hypothesized that review of randomized controlled clinical trials (RCTs) with nonstatistically significant or "negative" results published in the surgical literature do not have appropriate statistical power to demonstrate equivalency between treatment arms. DATA SOURCES AND STUDY SELECTION: The MEDLINE database was searched to obtain reports of all RCTs with negative results published in 3 surgical journals from 1988 to 1998. Manual review of one year (1997) of publications for each journal was performed to validate our search strategy. Equivalency was evaluated using the Two One-Sided Tests Procedure and post hoc power calculations. DATA SYNTHESIS: Ninety reports of RCTs with negative results were identified in the surgical literature between 1988 and 1998. The manual review of 1997 showed a 100% retrieval rate for our search strategy. After applying the Two One-Sided Tests Procedure, 35 reports (39%) met the criteria for demonstrating equivalency. The other 55 reports (61%) contained at least a 10% absolute difference in the 90% confidence interval of Delta. Using the power calculation method, only 22 (24%) articles had a power greater than.80 to detect a 50% difference in therapeutic effect. Only 29% of the reports included a formal sample size calculation and these studies were more likely to demonstrate equivalency than those without a sample size estimate (P<.01). CONCLUSIONS: Many reports from negative RCTs published in the surgical literature lack sufficient statistical power to establish that clinically important differences are not present. Surgeons should perform appropriate sample size calculations when designing RCTs and recognize the utility of confidence intervals when reporting negative results.
Survey of claims of no effect in abstracts of Cochrane reviews. Alderson P, Chalmers I. Bmj 2003: 326(7387); 475. [Medline] [Full text] [PDF]
Absence of evidence is not evidence of absence. Altman DG, Bland JM. British Medical Journal 1995: 311(7003); 485. [Medline] [Full text] PIP: Randomized controlled clinical trials are conducted to determine whether differences of clinical importance exist between selected treatment regimens. When statistical analysis of the study data finds a P value greater than 5%, it is convention to deem the assessed difference nonsignificant. Just because convention dictates that such study findings be termed nonsignificant, or negative, however, it does not necessarily follow that the study found nothing of clinical importance. Subject samples used in controlled trials tend to be too small. The studies therefore lack the necessary power to detect real, and clinically worthwhile, differences in treatment. Freiman et al. found that only 30% of a sample of 71 trials published in the New England Journal of Medicine in 1978-79 with a P value greater than 10% were large enough to have a 90% chance of detecting even a 50% difference in the effectiveness of the treatments being compared, and they found no improvement in a similar sample of trials published in 1988. It is therefore wrong and unwise to interpret so many negative trials as providing evidence of the ineffectiveness of new treatments. One must instead seriously question whether the absence of evidence is a valid justification for inaction. Efforts must be made to look for quantification of an association rather than just a P value, especially when the risks under investigation are small. The authors cite a recent trial comparing octreotide and sclerotherapy in patients with variceal bleeding, as well as the overview of clinical trials evaluating fibrinolytic treatment for preventing reinfarction after acute myocardial infarction as examples.
Underpowered clinical trials of antiretroviral treatment. Arribas JR, Pulido F. Jama 2002: 288(17); 2120; author reply 2120-1. [Medline]
The prevalence of negative studies with inadequate statistical power: an analysis of the plastic surgery literature. Chung KC, Kalliainen LK, Spilson SV, Walters MR, Kim HM. Plast Reconstr Surg 2002: 109(1); 1-6; discussion 7-8. [Medline] Studies published in the medical literature often neglect to consider the statistical power needed to detect a meaningful difference between study groups. Small sample sizes tend to produce negative results because of low statistical power. Studies that cannot make conclusive statements about their hypotheses can waste resources, deter further research, and impede advances in clinical treatment. The current study reviewed three of the most frequently read plastic surgery journals from 1976 to 1996 to determine the prevalence of inadequately (<80 percent) powered clinical trials and experimental studies that found no difference (negative studies) in the response variable of interest between comparison groups. The statistical power of 54 negative studies using continuous response variables was calculated to detect a difference of 1 SD (+/-1 SD) in means between the comparative groups. The power of another 57 negative studies with dichotomous response (yes/no) variables was calculated to detect a relative change in proportions of 25 percent and 50 percent from the experimental to the control group. It was found that 85 percent of the studies with continuous response variables had inadequate power to detect the desired mean difference of +/-1 SD. In studies with dichotomous response variables, 98 percent had inadequate power to detect a desired 25 percent relative change in proportions, and 74 percent had inadequate power to detect a desired 50 percent relative change in proportions. These results indicate that many of the studies in the plastic surgery literature lack adequate power to detect a moderate-to-large difference between groups. The lack of power makes the interpretation of the studies with negative findings inconclusive. Proper study design dictates that investigators consider a priori the difference between groups that is of clinical interest, and the sample size per group that is needed to provide adequate statistical power to detect the desired difference.
Putting trials on trial--the costs and consequences of small trials in depression: a systematic review of methodology. Hotopf M, Lewis G, Normand C. J Epidemiol Community Health 1997: 51(4); 354-8. STUDY OBJECTIVE: To determine why, despite 122 randomised controlled trials, there is no consensus about whether the selective serotonin reuptake inhibitors or tricyclic and related antidepressants should be used as first line treatment of depression. DESIGN: Systematic review of all RCTs comparing selective serotonin reuptake inhibitors and tricyclic or heterocyclic antidepressants. MAIN RESULTS: The shortcomings identified in the 122 trials were as follows: (1) there was inadequate description of randomisation, (2) the outcomes used were mainly observer rated measurements of depression, and studies failed to use quality of life measures or perform economic evaluations, (3) doses of tricyclic antidepressants were inadequate, (4) generalisability of studies was poor (including a reliance on secondary care settings and inadequate follow up), and (5) there were statistical shortcomings such as low statistical power, failure to use intention to treat analyses, and the tendency to make multiple comparisons. CONCLUSIONS: Future RCTs should be designed to inform policy makers and address these methodological shortcomings.
"Evidence of absence" can be important. Joffe M. Bmj 2003: 326(7401); 1267. [Medline] [Full text]
Epidemiological appraisal of studies of residential exposure to power frequency magnetic fields and adult cancers. Li CY, Theriault G, Lin RS. Occup Environ Med 1996: 53(8); 505-10. [Medline] OBJECTIVES: To appraise epidemiological evidence of the purported association between residential exposure to power frequency magnetic fields and adult cancers. METHODS: Literature review and epidemiological evaluation. RESULTS: Seven epidemiological studies have been conducted on the risk of cancer among adults in relation to residential exposure to power frequency magnetic fields. Leukaemia was positively associated with magnetic fields in three case-control studies. The other two case-control studies and two cohort studies did not show such a link. Brain tumours and breast cancer have rarely been examined by these studies. Based on the epidemiological results, the analysis of the role of chance and bias, and the criteria for causal inferences, it seems that the evidence is not strong enough to support the putative causal relation between residential exposure to magnetic fields and adult leukaemia, brain tumours, or breast cancer. Inadequate statistical power is far more a concern than selection bias, information bias, and confounding in interpreting the results from these studies, and in explaining inconsistencies between studies. CONCLUSIONS: Our reviews suggested that the only way to answer whether residential exposure to magnetic fields is capable of increasing the risks of adult cancers is to conduct more studies carefully avoiding methodological flaws, in particular small sample size. We also suggested that the risk of female breast cancer should be the object of additional investigations, and that future studies should attempt to include information on exposure to magnetic fields from workplaces as well as residential exposure to estimate the effects of overall exposure to magnetic fields.
Thirst, interdialytic weight gain, and thirst-interventions in hemodialysis patients: a literature review. Mistiaen P. Nephrol Nurs J 2001: 28(6); 601-4, 610-3; quiz 614-5. [Medline] A literature search completed over the period of 1980-1999 identified studies on the prevalence of thirst in hemodialysis (HD) patients and the relationship between thirst and interdialytic weight gain, as well as intervention studies in which thirst was used as an outcome variable. Twenty-three studies fulfilled the selection criteria and were included in the analysis. The prevalence of thirst varied between 6% and 95% across studies. In most studies more thirst was related to more weight gain. However, the studies were difficult to compare due to methodological differences. Three types of interventions were found: technical interventions in the dialysis mechanisms (increasing the frequency of dialysis sessions and varying the concentration of sodium in the dialysate), pharmaceutical interventions (ACE-inhibitors), and a dietetic intervention. Almost no conclusions could be drawn with regard to the effectiveness of these interventions due to methodological differences and weaknesses and due to the small sample sizes.
MR findings in humeral epicondylitis. A systematic review. Pasternack I, Tuovinen EM, Lohman M, Vehmas T, Malmivaara A. Acta Radiol 2001: 42(5); 434-40. [Medline] PURPOSE: To highlight the importance of meta-analysis in diagnostic imaging by presenting a systematic search of the literature on the accuracy of MR imaging in epicondylitis. MATERIAL AND METHODS: The literature was comprehensively reviewed to identify studies on MR findings in epicondylitis. Reviewers blind to the clinical diagnoses screened the data according to predetermined inclusion criteria. Data were collected and validity and relevance were assessed on structured forms. RESULTS: Seven studies including 148 patients with epicondylitis were accepted for the analysis. Eleven asymptomatic contralateral elbows and 29 elbows of healthy volunteers served as controls. The volunteers were distinctly younger than the patients. The MR technique was divergent, and the observed pathological changes also varied. The most frequent alteration was a change in the common extensor tendon signal (90%, 95% confidence interval 84-94%); 14% of the healthy volunteers and 50% of the contralateral elbows displayed the similar alteration. CONCLUSION: Small sample size and methodological shortcomings in the original studies make the assessment of MR findings in epicondylitis questionable. There is a need for well-designed studies in which clinical features and occupational backgrounds as well as imaging parameters are carefully documented.
The ethics of tiny trials. Phillips B. Arch Dis Child 2002: 87(3); 258. [Medline] Abstract not available yet.
Distinguishing between "no evidence of effect" and "evidence of no effect" in randomised controlled trials and other comparisons. Tarnow-Mordi WO, Healy M. Arch Dis Child 1999: 80(3); 210-213. [Full text] [PDF]
Cost effectiveness calculations and sample size. Torgerson DJ, Campbell MK. BMJ 2000: 321; 697. [Full text] [PDF]
Elevated blood lead levels in children of construction workers. Whelan E, Piacitelli G, Gerwel B, Schnorr T, Mueller C, Gittleman J, Matte T. American Journal of Public Health 1997: 87(8); 1352-55. ABSTRACT: OBJECTIVES: This study examined whether children of lead-exposed construction workers had higher blood lead levels than neighborhood control children. METHODS: Twenty-nine construction workers were identified from the New Jersey Adult Blood Lead Epidemiology and Surveillance (ABLES) registry. Eighteen control families were referred by workers. Venous blood samples were collected from 50 children (31 exposed, 19 control subjects) under age 6. RESULTS: Twenty-six percent of workers children had blood lead levels at or over the Centers for Disease Control and Prevention action level of 0.48 mumol/L (10 micrograms/dL), compared with 5% of control children (unadjusted odds ratio = 6.1; 95% confidence interval = 0.9, 147.2). CONCLUSIONS: Children of construction workers may be at risk for excessive lead exposure. Health care providers should assess parental occupation as a possible pathway for lead exposure of young children.
What is the chance that this study is clinically significant? A proposal for Q values. Froehlich GW. Eff Clin Pract 1999: 2(5); 234-9. [Medline] [Full text] CONTEXT: Clinicians who use the medical literature to guide their practice need to make judgments about the clinical significance of medical interventions. GENERAL QUESTION: How likely is an intervention to be clinically worthwhile? SPECIFIC RESEARCH CHALLENGE: Given the results of a study, determining the probability that the true effect of an intervention is at least as great as some minimum worthwhile effect. CURRENT APPROACH: P values are widely used to convey the probability of observed effects arising by chance if there truly is no effect. By convention, P values less than 0.05 are interpreted as being "statistically significant." POTENTIAL DIFFICULTIES: Statistical significance is often confused with clinical significance. ALTERNATE APPROACH: A different probability could be reported, a probability I call a Q value. A Q value is the probability that the true effect of an intervention is at least as great as some minimum worthwhile effect. Q values are calculated in a manner analogous to that used for P values, except that the null hypothesis becomes a minimum worthwhile effect instead of no effect. Q values encourage researchers and clinicians to be explicit about what they think a worthwhile effect is and could help shift the focus of study interpretation away from arbitrary statistical conventions.
Effect of homoeopathy on pain and other events after acute trauma: placebo controlled trial with bilateral oral surgery. Lokken P, Straumsheim PA, Tveiten D, Skjelbred P, Borchgrevink CF. British Medical Journal 1995: 310(6992); 1439-42. [Medline] [Abstract] [Full text] OBJECTIVE--To examine whether homoeopathy has any effect on pain and other inflammatory events after surgery. DESIGN--Randomised double blind, placebo controlled crossover trial with "identical" oral surgical procedures performed on two separate occasions in 24 patients. INTERVENTIONS--Treatment started 3 hours after surgery with either homoeopathy or placebo. MAIN OUTCOME MEASURES--Postoperative pain and preference for postoperative course assessed by patients on visual analogue scales. Measurements of postoperative swelling and reduction in ability to open mouth. Assessment of bleeding after surgery. RESULTS--Pain after surgery was essentially the same whether treated with homoeopathy or placebo. Postoperative swelling was not significantly affected by homoeopathy, but treatment tended to give less reduction in ability to open mouth. No noticeable difference was seen in postoperative bleeding, side effects, or complaints. Thirteen of the 24 patients preferred the postoperative course with placebo. CONCLUSIONS--No positive evidence was found for efficacy of homoeopathic treatment on pain and other inflammatory events after an acute soft tissue and bone injury inflicted by a surgical intervention. Differences in the order of 30% to 40% would have been needed to show significant effects.
Interventions for promoting smoking cessation during pregnancy. Lumley J, Oliver S, Waters E. Cochrane Database Syst Rev 2000: (2); CD001055. [Abstract] BACKGROUND: Smoking remains one of the few potentially preventable factors associated with low birthweight, very preterm birth and perinatal death. OBJECTIVES: The objective of this review was to assess the effects of smoking cessation programs implemented during pregnancy on the health of the fetus and infant, on the mother and on the family. SEARCH STRATEGY: We searched the Cochrane Pregnancy and Childbirth Group trials register and the Cochrane Tobacco Addiction Group trials register. SELECTION CRITERIA: Randomised and quasi-randomised trials of smoking cessation programs implemented during pregnancy. DATA COLLECTION AND ANALYSIS: Trial quality was assessed and data were extracted independently by two reviewers. MAIN RESULTS: Forty-four trials were identified: 37 trials including 16,916 women provided data on smoking cessation and/or perinatal outcomes, as did one cluster-randomised trial including 3000 women. Over 800 women were included in trials of smoking relapse prevention. There was substantial variation in the intensity of the intervention and the extent of reminders and reinforcement through pregnancy. Based on 34 trials there was a significant reduction in smoking in the intervention groups (odds ratio 0.53, 95% confidence interval 0. 47 to 0.60), an absolute difference of 6.4% women continuing to smoke. The eight trials with validated smoking cessation, a high intensity intervention and a high quality score had an odds ratio of 0.53, 95% confidence interval 0.44 to 0.63 and an absolute difference in continued smoking of 8.1%. The subset of trials with information on fetal outcome revealed a reduction in low birthweight (odds ratio 0.80, 95% confidence interval 0.67 to 0.95), a reduction in preterm birth (odds ratio 0.83, 95% confidence interval 0.69 to 0. 99) and an increase in mean birthweight of 28g (95% confidence interval 9 to 49). There were no differences in very low birthweight or perinatal mortality. Five trials of smoking relapse prevention showed no significant difference. The single large cluster-randomised trial showed no evidence of a decrease in continued smoking or adjusted mean birthweight. REVIEWER'S CONCLUSIONS: Smoking cessation programs in pregnancy appear to reduce smoking, low birthweight and preterm birth, but no effect was detected for very low birthweight or perinatal mortality.
The association of nonsteroidal anti-inflammatory drugs with upper gastrointestinal tract bleeding. Carson JL, Strom BL, Soper KA, West SL, Morse ML. Arch Intern Med 1987: 147(1); 85-8. [Medline] To evaluate the risk of developing upper gastrointestinal (UGI) bleeding from nonsteroidal anti-inflammatory drugs (NSAIDs), a retrospective (historical) cohort study was performed, using a computerized data base including 1980 billing data from all Medicaid patients in the states of Michigan and Minnesota. Comparing 47,136 exposed patients to 44,634 unexposed patients, the unadjusted relative risk for developing UGI bleeding 30 days after exposure to a NSAID was 1.5 (95% confidence interval 1.2 to 2.0). Univariate analyses demonstrated associations between UGI bleeding and age, sex, state, alcohol-related diagnoses, preexisting abdominal conditions, and use of anticoagulants. This association between NSAIDs and UGI bleeding was unchanged after adjusting for these potential confounding variables using logistic regression. A linear dose-response relationship and a quadratic duration-response relationship were demonstrated. Non-steroidal anti-inflammatory drugs are associated with UGI bleeding, although the magnitude of the increased risk is reassuringly small.
Grapefruits and drugs: when is statistically significant clinically significant? Abernethy DR. J Clin Invest 1997: 99(10); 2297-8. [Medline] [Full text] [PDF]
Size and quality of randomised controlled trials in head injury: review of published studies. Dickinson K, Bunn F, Wentz R, Edwards P, Roberts I. British Medical Journal 2000: 320; 1308-1311. [Medline] [Abstract] [Full text] [PDF] Objective: To assess whether trials in head injury are large enough to avoid moderate random errors and designed to avoid moderate biases. Design: All randomised controlled trials on the treatment and rehabilitation of patients with head injury published before December 1998 were surveyed. Trials were identified from electronic databases, by hand searching journals and conference proceedings, and by contacting researchers. Data were extracted on the number of participants, quality of concealment of allocation, use of blinding, loss to follow up, and types of participants, interventions, and outcome measures. Results: 279 reports were identified, containing information on 208 separate trials. The average number of participants per trial was 82, with no evidence of increasing size over time. The total number of randomised participants in the 203 trials in which size was reported was 16 613. No trials were large enough to detect reliably a 5% absolute reduction in the risk of death or disability, and only 4% were large enough to detect an absolute reduction of 10%. Concealment of allocation was adequate in 22 and inadequate or unclear in 25 of the 47 (23%) in which it was reported. Of 126 trials assessing disability, 111 reported the number of patients followed up, and average loss to follow up was 19%. Of trials measuring disability, 26 (21%) reported that outcome assessors were blinded. Conclusions: Randomised trials in head injury are too small and poorly designed to detect or refute reliably moderate but clinically important benefits or hazards of treatment. Limited funding for injury research and unfamiliarity with issues of consent may have been important obstacles.
Quality of randomised controlled trials in head injury. Trials in head injury are more complex than review suggests. Murray GD, Teasdale GM. British Medical Journal 2000: 321(7270); 1223. [Medline] [Full text]
Assessing clinically significant change: application to the SCL-90-R. Schmitz N, Hartkamp N, Franke GH. Psychol Rep 2000: 86(1); 263-74. [Medline] A Symptom Checklist (SCL-90-R) is a potentially useful measure of psychological distress; it is frequently used in psychotherapy research and clinical practice. The purpose of this study was to illustrate the use of the SCL-90-R for determining statistically reliable change and clinical significance outlined by Jacobson and Truax in 1991. This paper describes the concepts of statistical and clinical significance of change. A proposal for obtaining and characterizing samples is made. Then a clinician's perspective is taken. Reliable change estimates and cut-off scores are chosen based on outcome data. Selected data from a single psychotherapeutic process and outcome study then were used to test the estimates of change and cut-off scores.
The Crack Baby Epidemic That Wasn't. What Statistics Mean, and Don't Mean. Schwartzberg NS. Accessed on 2005-03-11 (link broken). Statistics form the basis of scientific findings. While researchers are responsible for experimental design and quantification, journalists must understand the limitations of statistical methods. Reporters need to provide real world context to the results and differentiate between significant and meaningful differences. www.biomednet.com/hmsbeagle/50/people/op_ed.htm
Multiple doses of secretin in the treatment of autism: a controlled study. Sponheim E, Oftedal G, Helverschou SB. Acta Paediatr 2002: 91(5); 540-5. [Medline] Dramatic effects on autistic behaviour after repeated injections of the gastrointestinal hormone secretin have been referred in a number of case reports. In the absence of curative and effective treatments for this disabling condition, this information has created new hope among parents. Although controlled studies on the effect of mainly one single dose have not documented any effect, many children still continue to receive secretin. Six children enrolled in a double-blind, placebo-controlled crossover study in which each child was its own control. Human synthetic secretin, mean dose 3.4 clinical units, and placebo were administered intravenously in randomized order every 4th wk, on three occasions each. The measurement instruments were the visual analogue scale (VAS) and the aberrant behaviour checklist (ABC). Statistically significant differences were found for placebo in 3 out of 6 children and for secretin in one child, using parental ratings only (VAS scores). Differences were small and lacked clinical significance, which was in accordance with the overall impression of the parents and teachers and visual inspection of graphs. Conclusion: In this placebo-controlled study, multiple doses of secretin did not produce any symptomatic improvement.
What is the relationship between the minimally important difference and health state utility values? The case of the SF-6D. Walters SJ, Brazier JE. Health Qual Life Outcomes 2003: 1(1); 4. [Medline] BACKGROUND: The SF-6D is a new single summary preference-based measure of health derived from the SF-36. Empirical work is required to determine what is the smallest change in SF-6D scores that can be regarded as important and meaningful for health professionals, patients and other stakeholders. OBJECTIVES: To use anchor-based methods to determine the minimally important difference (MID) for the SF-6D for various datasets. METHODS: All responders to the original SF-36 questionnaire can be assigned an SF-6D score provided the 11 items used in the SF-6D have been completed. The SF-6D can be regarded as a continuous outcome scored on a 0.29 to 1.00 scale, with 1.00 indicating "full health".Anchor-based methods examine the relationship between an health-related quality of life (HRQoL) measure and an independent measure (or anchor) to elucidate the meaning of a particular degree of change. One anchor-based approach uses an estimate of the MID, the difference in the QoL scale corresponding to a self-reported small but important change on a global scale. Patients were followed for a period of time, then asked, using question 2 of the SF-36 as our global rating scale, (which is not part of the SF-6D), if there general health is much better (5), somewhat better (4), stayed the same (3), somewhat worse (2) or much worse (1) compared to the last time they were assessed. We considered patients whose global rating score was 4 or 2 as having experienced some change equivalent to the MID. In patients who reported a worsening of health (global change of 1 or 2) the sign of the change in the SF-6D score was reversed (i.e. multiplied by minus one). The MID was then taken as the mean change on the SF-6D scale of the patients who scored (2 or 4). RESULTS: This paper describes the MID for the SF-6D from seven longitudinal studies that had previously used the SF-36. CONCLUSIONS: From the seven reviewed studies (with nine patient groups) the MID for the SF-6D ranged from 0.010 to 0.048, with a weighted mean estimate of 0.033 (95% CI: 0.029 to 0.037). The corresponding Standardised Response Means (SRMs) ranged from 0.11 to 0.48, with a mean of 0.30 and were mainly in the "small to moderate" range using Cohen's criteria, supporting the MID results. Using the half-standard deviation (of change) approach the mean effect size was 0.051 (range 0.033 to 0.066). Further empirical work is required to see whether or not this holds true for other patient groups and populations.
The meaning of 6.8: numeracy and normality in health information talks. Adelsward V, Sachs L. Soc Sci Med 1996: 43(8); 1179-87. [Medline] The ambiguities of risk which stem from its translation from epidemiological findings into clinical knowledge and practice and thus to lay experiences of health and illness is a clear dilemma. How are risks expressed statistically, or otherwise mathematically, to be interpreted and communicated within the discourse of medico-science, and how within the discourse of an individual's everyday life? An important tool in all risk discourses and in preventive practices such as health information is testing and test results. Test results--presented in mathematical terms as points on a scale, or as a number--are in fact fundamental to preventive practice. But what do we know about how people involved in these tests understand them and how the results are used in the construction of ideas about risk and normalcy? This article attempts to answer part of that question by drawing on an empirical study of the use of numbers as metaphors in talks between a nurse and her potential patients in a directed health survey.
Completeness of reporting trial results: effect on physicians' willingness to prescribe. Bobbio M, Demichelis B, Giustetto G. Lancet 1994: 343(8907); 1209-11. [Medline] Clinical trials may lead to conflicting results. We studied how different ways of reporting results affected physicians' recommendations. A questionnaire distributed to 148 general practitioners presented results of a clinical trial where a reduction of cardiac events and an increase of mortality was reported. Results were shown in four different ways--relative risk reduction, absolute risk reduction, percentages of event-free patients, number needing to be treated to prevent an event--as if they derived from different trials. A fifth presentation was the reduced rate of cardiac events along with the increased rate of mortality. Physicians were asked to estimate how much they would be willing to prescribe each drug. The mean agreement of physicians' decisions was 77 (28)% for relative risk reduction, 24 (28)% for absolute risk reduction, 37 (37)% for different percentages event-free patients, 34 (34)% for number need to treat, and 23 (28)% for events reduction and mortality for increase (p < 0.001 relative risk vs others). The method of reporting trial results and the completeness of information in the case of controversial results affects physicians willingness to prescribe.
General practice registrar responses to the use of different risk communication tools in simulated consultations: a focus group study. Edwards A. British Medical Journal 1999: 319(7212); 749-752. ABSTRACT: OBJECTIVES: To pilot the use of a range of complementary risk communication tools in simulated general practice consultations; to gauge the responses of general practitioners in training to these new consultation aids. DESIGN: Qualitative study based on focus group discussions. SETTING: General practice vocational training schemes in South Wales. PARTICIPANTS: 39 general practice registrars and eight course organisers attended four sessions; three simulated patients attended each time. METHOD: Registrars consulting with simulated patients used verbal or "qualitative" descriptions of risks, then numerical data, and finally graphical presentations of the same data. Responses of doctors and patients were explored by semistructured discussions that had been audiotaped for transcription and analysis. RESULTS: The process of using risk communication tools in simulated consultations was acceptable to general practitioner registrars. Providing doctors with information about risks and benefits of treatment options was generally well received. Both doctors and patients found it helped communication. There were concerns about the lack of available, unbiased, and applicable evidence and a shortage of time in the consultation to discuss treatment options adequately. Graphical presentation of information was often favoured-an approach that also has the potential to save consultation time. CONCLUSIONS: A range of risk communication "tools" with which to discuss treatment options is likely to be more applicable than a single new strategy. These tools should include both absolute and relative risk information formats, presented in an unbiased way. Using risk communication tools in simulated consultations provides a model for training in risk communication for professional groups.
Explaining risks: turning numerical data into meaningful pictures. Edwards A, Elwyn G, Mulley A. Bmj 2002: 324(7341); 827-30. [Medline] [Full text] [PDF]
Evidence based purchasing: understanding results of clinical trials and systematic reviews. Fahey T, Griffiths S, Peters TJ. British Medical Journal 1995: 311(7012); 1056-9; discussion 1059-60. [Medline] [Abstract] [Full text] OBJECTIVE--To assess whether the way in which the results of a randomised controlled trial and a systematic review are presented influences health policy decisions. DESIGN--A postal questionnaire to all members of a health authority within one regional health authority. SETTING--Anglia and Oxford regional health authorities. SUBJECTS--182 executive and non-executive members of 13 health authorities, family health services authorities, or health commissions. MAIN OUTCOME MEASURES--The average score from all health authority members in terms of their willingness to fund a mammography programme or cardiac rehabilitation programme according to four different ways of presenting the same results of research evidence--namely, as a relative risk reduction, absolute risk reduction, proportion of event free patients, or as the number of patients needed to be treated to prevent an adverse event. RESULTS--The willingness to fund either programme was significantly influenced by the way in which data were presented. Results of both programmes when expressed as relative risk reductions produced significantly higher scores when compared with other methods (P < 0.05). The difference was more extreme for mammography, for which the outcome condition is rarer. CONCLUSIONS--The method of reporting trial results has a considerable influence on the health policy decisions made by health authority members.
Absolutely relative: how research results are summarized can affect treatment decisions. Forrow L, Taylor W, Arnold R. The American Journal of Medicine 1992: 92(2); 121-24. ABSTRACT: PURPOSE: To determine whether alternative methods of presenting a contrast between the same two quantities in descriptions of research results could lead to different treatment decisions by physicians. SUBJECTS AND METHODS: We conducted a survey of practicing physicians and of faculty and fellows in training programs in clinical epidemiology and social science research methods. Each questionnaire presented results from a published study of either hypertension or hypercholesterolemia in two different ways: once as the relative change in the outcome rate and once as the absolute change in the outcome rate. We asked respondents to read each summary and indicate how the information contained in the summary would influence decisions about treatment. RESULTS: Of the 235 physicians who completed the questionnaire, 108 (46%) gave different responses to the same results presented in different ways. Of these, 97 (89.8%) indicated a stronger inclination to treat patients after reading of the relative change in the outcome rate (p less than 0.0001). CONCLUSION: The manner of presentation of results can influence physicians' judgments about the treatment of patients.
Communicating the benefits of chronic preventive therapy: does the format of efficacy data determine patients' acceptance of treatment? Hux J, Naylor C. Medical Decision Making 1995: 15(2); 152-7. ABSTRACT: Patients' informed acceptance of chronic medical therapy hinges on communicating the potential benefits of drugs in quantitative terms. In a hypothetical scenario of treatment initiation, the authors assessed how three different formats of the same data affected the willingness of 100 outpatients to take what were implied to be three different lipid-lowering drugs. Side-effects were declared negligible and costs insured. Subjects make a "yes-no" decision about taking such a medication, and graded the decision on a certainty scale. Advised of a relative risk reduction--"34% reduction in heart attacks"--88% of the patients assented to therapy. All other formats elicited significantly more refusals (p < 0.0001): for absolute risk difference--"1.4% fewer patients had heart attacks"--42% assented; for inverted absolute risk--"treat 71 persons for 5 years to prevent one heart attack"--only 31% accepted treatment. When the data were extrapolated to disease-free survival--"average gain of 15 weeks"--40% consented. Similar responses were obtained for descriptions of an antihypertensive drug: 89% assented to therapy when given relative risk reduction but only 46% when given absolute risk reduction. The subjects were confident in both acceptance and refusal: 93% of the decisions were rated "somewhat certain" to "completely certain." The authors conclude that patients' views of medical therapy are shaped by the formats in which potential benefits are presented. Multiple complementary formats may be most appropriate. The results imply that many patients may decline treatment if briefed on the likelihood or extent of benefit.
Absolute and relative truth in clinical trials. Julian D. Lancet 2002(June): 359(9321); 1945-1946. Abstract not available.
Quality of life questionnaires: does statistically significant = clinically important? Juniper EF. J Allergy Clin Immunol 1998: 102(1); 16-7. [Medline] [Full text] [PDF]
An assessment of clinically useful measures of the consequences of treatment. Laupacis A, Sackett D, Roberts R. New England Journal of Med 1988: 318(26); 1728-1733. [Medline]
Consider absolute risks in SIDS prevention. Logan S. Arch Dis Child 2000: 83(5); 457. Abstract not available yet.
Who benefits from medical interventions? Smith GD, Egger M. Bmj 1994: 308(6921); 72-4. [Medline] [Full text] Abstract not available.
"Absolute" is inappropriate for quantitative risk estimation. Tunstall-Pedoe H. BMJ 2000: 320(7236); 723-. [Full text]
Interpreting treatment effects in randomised trials. Guyatt GH, Juniper E, Walter S, Griffith L, Goldstein R. British Medical Journal 1998: 316(7132); 690-693. [Medline] [Full text] [PDF] [Excerpt] The need to measure the impact of treatments on health related quality of life has led to a rapid increase in the variety of instruments available and in their use as measures of outcome in clinical trials. One limitation of instruments that purport to measure health related quality of life is difficulty interpreting their results. In the past decade, investigators have progressed in making these questionnaire results interpretable. For example, we have shown that when questionnaires present response options in the form of seven point scales with verbal descriptions for each option (see box), the smallest difference that patients consider important is often approximately 0.5 per question. A moderate difference corresponds to a change of approximately 1.0 per question, and changes of greater than 1.5 can be considered large. Thus, for example, in a domain with four items, patients will consider a 1 point change in two or more items as important. This finding applies across different areas of function, including dyspnoea, fatigue, and emotional function in patients with chronic airflow limitation1; and symptoms, emotional function, and activity limitations in adults2 and children3 with asthma, parents of children with asthma,4 and adults with rhinoconjunctivitis.5 Initially, we used comparisons in the same patient to establish this difference, but more recently we have replicated this finding using differences between patients.
Can there be a more patient-centred approach to determining clinically important effect sizes for randomized treatment trials? Naylor CD. J Clin Epidemiol 1994: 47(7); 787-95. [Medline] Sample sizes for treatment trials with categorical outcomes are conventionally derived by balancing three elements: a difference between alternative treatments in the event rates for the outcomes of interest (commonly termed the clinically important difference), the alpha error tolerance (false positive risk) and the beta error tolerance (false negative risk). Clinically important differences used to plan trials are chosen in part based on earlier experience with similar interventions (i.e. biological or clinical plausibility). Methodological conventions and clinicians' perceptions will also affect choices. Lastly, practical concerns about the feasibility of accruing large numbers of subjects may drive trialists to specify bigger differences as clinically important, with a view to containing sample size requirements. We suggest that patients or other members of the public be given an active role in determining the magnitude of the clinically important treatment effect for trial planning. Probability trade-offs could be constructed to enable patients and/or healthy volunteers to indicate the degree of benefit they would want from a "new" treatment, given the potential side-effects of the same treatment. This method has the advantage of respecting patient autonomy and principles of informed consent. It provides an additional consideration when plausible effect sizes and error tolerances on hypothesis tests are balanced against feasibility of accruing various sample sizes. Its primary disadvantage is inconvenience, as it adds another step to trial design. On the other hand, if patient-based clinically important differences are generated for a variety of disease states and types of treatments, specific trade-off exercises may be needed only for unusual trials.(ABSTRACT TRUNCATED AT 250 WORDS)
Measurement of Fatigue Determining Minimally Important Cllinical Differences. Schwartz AL, Meek PM, Nail LM, Fargo J, Lundquist M, Donofrio M, Grainger M, Throckmorton T, Mateo M. Journal of Clinical Epidemiology 2002: 55(3); 239 - 244. [Medline]
Measurement of fatigue. determining minimally important clinical differences. Schwartz AL, Meek PM, Nail LM, Fargo J, Lundquist M, Donofrio M, Grainger M, Throckmorton T, Mateo M. J Clin Epidemiol 2002: 55(3); 239-44. [Medline] The purpose was to determine the minimally important clinical difference (MICD) in fatigue as measured by the Profile of Mood States, Schwartz Cancer Fatigue Scale (SCFS), General Fatigue Scale, and a 10-point single-item fatigue measure. The MICD is the smallest amount of change in a symptom (e.g., fatigue) measure that signifies an important change in that symptom. Subjects rated the degree of change in their fatigue over 2 days on a Global Rating Scale. 103 patients were enrolled on this multisite prospective repeated measures design. MICD was determined following established procedures at two time points. Statistically significant changes were observed for moderate and large changes in fatigue, but not for small changes. The scales were sensitive to increases in fatigue over time. The MICD, presented as mean change, for each scale and per item on each scale is: POMS = 5.6, per item = 1.1, SCFS = 5.0, per item = 0.8, GFS = 9.7, per item = 1.0, and the single item measure of fatigue was 2.4 points. This information may be useful in interpreting scale scores and planning studies using these measures.
Here are some results that may or may not be important.
Traumatic Brain Injury: Patterns of Failure of Nonoperative Management. Patel NY. The Journal of Trauma 2000: 48(3); 367-373. [Medline] ABSTRACT: OBJECTIVE: The circumstances of failure for nonoperative management of blunt traumatic brain injury have been poorly defined. In this study, all trauma patients identified over a 12-year period with progression of neurologic injury requiring craniotomy were retrospectively reviewed. METHODS: Data collected included demographic information, mechanism of injury, field and admission vital signs, and Glasgow Coma Scale score, medications, associated injuries, and coagulopathy. Head computed tomographic scans were reviewed, and anatomic findings were correlated with clinical changes (change in mental status or elevation of intracranial pressure) that led to subsequent CT scan and craniotomy. RESULTS: Of 20,100 patients, there were 852 who had computed tomographic scans with acute intracranial injuries on admission; 462 patients were managed nonoperatively. Fifty-seven patients had progression of neurologic injury (34 < 24 hours = early; 23 > 24 hours = late) that required surgery. CONCLUSION: Of the variables investigated, only anatomic location of injury was found to be predictive of early failure of nonoperative management. Frontal intraparenchymal hematomas are particularly prone to early failure. Clinical examination and intracranial pressure monitoring are equally important in detecting failure and should be an integral part of nonoperative management.
Overview of health-related quality-of-life measures for pediatric patients: application in the assessment of pharmacotherapeutic and pharmacoeconomic outcomes. Marra CA, Levine M, McKerrow R, Carleton BC. Pharmacotherapy 1996: 16(5); 879-88. Health-related quality of life (HRQOL) is an important dimension in assessing health care. Several methodologic considerations are related to the manner in which these data are obtained in children. Few multidimensional generic measures of quality of life (QOL) have been developed for children and adolescents. Most published research concerns the development of tools to be used in a disease-specific manner for clinical trials. Although several authors point out numerous advantages in assessing HRQOL in clinical practice, several barriers must be overcome for this to occur. In the current era of economic restraint, HRQOL measures must be integrated into pharmaco-economic analyses to assess fully the impact of a drug on health care resources and outcomes.
Views of practicing physicians and the public on medical errors. Blendon RJ, DesRoches CM, Brodie M, Benson JM, Rosen AB, Schneider E, Altman DE, Zapert K, Herrmann MJ, Steffenson AE. N Engl J Med 2002: 347(24); 1933-40. [Medline] BACKGROUND: In response to the report by the Institute of Medicine on medical errors, national groups have recommended actions to reduce the occurrence of preventable medical errors. What is not known is the level of support for these proposed changes among practicing physicians and the public. METHODS: We conducted parallel national surveys of 831 practicing physicians, who responded to mailed questionnaires, and 1207 members of the public, who were interviewed by telephone after selection with the use of random-digit dialing. Respondents were asked about the causes of and solutions to the problem of preventable medical errors and, on the basis of a clinical vignette, were asked what the consequences of an error should be. RESULTS: Many physicians (35 percent) and members of the public (42 percent) reported errors in their own or a family member's care, but neither group viewed medical errors as one of the most important problems in health care today. A majority of both groups believed that the number of in-hospital deaths due to preventable errors is lower than that reported by the Institute of Medicine. Physicians and the public disagreed on many of the underlying causes of errors and on effective strategies for reducing errors. Neither group believed that moving patients to high-volume centers would be a very effective strategy. The public and many physicians supported the use of sanctions against individual health professionals perceived as responsible for serious errors. CONCLUSIONS: Though substantial proportions of the public and practicing physicians report that they have had personal experience with medical errors, neither group has the sense of urgency expressed by many national organizations. To advance their agenda, national groups need to convince physicians, in particular, that the current proposals for reducing errors will be very effective.
The measurement and monitoring of surgical adverse events. Bruce J, Russell EM, Mollison J, Krukowski ZH. Accessed on 2003-08-15. BACKGROUND: Surgical adverse events contribute significantly to postoperative morbidity, yet the measurement and monitoring of events is often imprecise and of uncertain validity. Given the trend of decreasing length of hospital stay and the increase in use of innovative surgical techniques--particularly minimally invasive and endoscopic procedures--accurate measurement and monitoring of adverse events is crucial. OBJECTIVES: The aim of this methodological review was to identify a selection of common and potentially avoidable surgical adverse events and to assess whether they could be reliably and validly measured, to review methods for monitoring their occurrence and to identify examples of effective monitoring systems for selected events. This review is a comprehensive attempt to examine the quality of the definition, measurement, reporting and monitoring of selected events that are known to cause significant postoperative morbidity and mortality. METHODS - SELECTION OF SURGICAL ADVERSE EVENTS: Four adverse events were selected on the basis of their frequency of occurrence and likelihood of evidence of measurement and monitoring: (1) surgical wound infection; (2) anastomotic leak; (3) deep vein thrombosis (DVT); (4) surgical mortality. Surgical wound infection and DVT are common events that cause significant postoperative morbidity. Anastomotic leak is a less common event, but risk of fatality is associated with delay in recognition, detection and investigation. Surgical mortality was selected because of the effort known to have been invested in developing systems for monitoring surgical death, both in the UK and internationally. Systems for monitoring surgical wound infection were also included in the review. METHODS - LITERATURE SEARCH: Thirty separate, systematic literature searches of core health and biomedical bibliographic databases (MEDLINE, EMBASE, CINAHL, HealthSTAR and the Cochrane Library) were conducted. The reference lists of retrieved articles were reviewed to locate additional articles. A matrix was developed whereby different literature and study designs were reviewed for each of the surgical adverse events. Each article eligible for inclusion was independently reviewed by two assessors. METHODS - CRITICAL APPRAISAL: Studies were appraised according to predetermined assessment criteria. Definitions and grading scales were assessed for: content, criterion and construct validity; repeatability; reproducibility; and practicality (surgical wound infection and anastomotic leak). Monitoring systems for surgical wound infection and surgical mortality were assessed on the following criteria: (1) coverage of the system; (2) whether or not denominator data were collected; (3) whether standard and agreed definitions were used; (4) inclusion of risk adjustment; (5) issues related to data collection; (6) postdischarge surveillance; (7) output in terms of feedback and wider dissemination. RESULTS - SURGICAL WOUND INFECTION: A total of 41 different definitions and 13 grading scales of surgical wound infection were identified from 82 studies. Definitions of surgical wound infection varied from presence of pus to complex definitions such as those proposed by the Centres for Disease Control in the USA. A small body of literature has been published on the content, criterion and construct validity of different definitions, and comparisons have been made against wound assessment scales and multidimensional indices. There are examples of comprehensive hospital-based monitoring systems of surgical wound infection, mainly under the auspices of nosocomial surveillance. To date, however, there is little evidence of systematic measurement and monitoring of surgical wound infection after hospital discharge. RESULTS - ANASTOMOTIC LEAK: Over 40 definitions of anastomotic leak were extracted from 107 studies of upper gastrointestinal, hepatopancreaticobiliary and lower gastrointestinal surgery. No formal evaluations were found that assessed the validity or reliability of definitions or severity scales of anastomotic leak. One definition was proposed during a national consensus workshop, but no evidence of its use was found in the surgical literature. The lack of a single definition or gold standard hampers comparison of postoperative anastomotic leak rates between studies and institutions. RESULTS - DEEP VEIN THROMBOSIS: Although a critical review of the DVT literature could not be completed within the realms of this review, it was evident that a number of new techniques for the detection and diagnosis of DVT have emerged in the last 20 years. The group recommends a separate review be undertaken of the different diagnostic tests to detect DVT. RESULTS - SURGICAL MORTALITY MONITORING SYSTEMS: The definition of surgical mortality is relatively consistent between monitoring systems, but duration of follow-up of death postdischarge varies considerably. The majority of systems report in-hospital mortality rates; only some have the potential to link deaths to national death registers. Risk assessment is an important factor and there should be a distinction between recording pre-intervention factors and postoperative complications. A variety of risk scoring systems was identified in the review. Factors associated with accurate and complete data collection include the employment of local, dedicated personnel, simple and structured prompts to ensure that clinical input is complete, and accurate and automated data capture and transfer. CONCLUSIONS: The use of standardised, valid and reliable definitions is fundamental to the accurate measurement and monitoring of surgical adverse events. This review found inconsistency in the quality of reporting of postoperative adverse events, limiting accurate comparison of rates over time and between institutions. The duration of follow-up for individual events will vary according to their natural history and epidemiology. Although risk-adjusted aggregated rates can act as screening or warning systems for adverse events, attribution of whether events are avoidable or preventable will invariably require further investigation at the level of the individual, unit or department. CONCLUSIONS - RECOMMENDATIONS FOR RESEARCH: (1) A single, standard definition of surgical wound infection is needed so that comparisons over time and between departments and institutions are valid, accurate and useful. Surgeons and other healthcare professionals should consider adopting the 1992 Centers for Disease Control (CDC) definition for superficial incisional, deep incisional and organ/space surgical site infection for hospital monitoring programmes and surgical audits. There is a need for further methodological research into the performance of the CDC definition in the UK setting. (2) There is a need to formally assess the reliability of self-diagnosis of surgical wound infection by patients. (3) There is a need to assess formally the reliability of case ascertainment by infection control staff. (4) Work is needed to create and agree a standard, valid and reliable definition of anastomotic leak which is acceptable to surgeons. (5) A systematic review is needed of the different diagnostic tests for the diagnosis of DVT. (6) The following variables should be considered in any future DVT review: anatomical region (lower limb, upper limb, pelvis); patient presentation (symptomatic, asymptomatic); outcome of diagnostic test (successfully completed, inconclusive, technically inadequate, negative); length of follow-up; cost of test; whether or not serial screening was conducted; and recording of laboratory cut-off values for fibrinogen equivalent units. (7) A critical review is needed of the surgical risk scoring used in monitoring systems. (8) In the absence of automated linkage there is a need to explore the benefits and costs of monitoring in primary care. (9) The growing potential for automated linkage of data from different sources (including primary care, the private sector and death registers) needs to be explored as a means of improving the ascertainment of surgical complications, including death. This linkage needs to be within the terms of data protection, privacy and human rights legislation. (10) A review is needed of the extent of the use and efficiency of routine hospital data versus special collections or voluntary reporting. www.ncchta.org/fullmono/mon522.pdf
Problems for clinical judgement: 4. Surviving in the report card era. Tu JV, Schull MJ, Ferris LE, Hux JE, Redelmeier DA. Cmaj 2001: 164(12); 1709-12. [Medline] [Abstract] [Full text] [PDF] Health care report cards involve comparisons of health care systems, hospitals or clinicians on performance measures. They are going to be an important feature of medical care in Canada in the new millennium as patients demand more information about their medical care. Although many clinicians are aware of this growing trend, they may not be prepared for all of its implications. In this article, we provide some historical background on health care report cards and describe a number of strategies to help clinicians survive and thrive in the report card era. We offer a number of tips ranging from knowing your outcomes first to proactively getting involved in developing report cards.
The caffeine metabolic ratio as an index of xanthine oxidase activity in clinically active and silent celiac patients. Boda M, Nemeth I, Boda D. Journal of Pediatrics Gastroenterology and Nutrition 1999: 29(5); 546-50. [Medline] BACKGROUND: The xanthine oxidoreductase system has been identified as one of the main sources of free radicals responsible for various forms of tissue injury. Because the intestinal villi are an important location of this enzyme, it was of interest to study the role of xanthine oxidase in gluten-sensitive celiac enteropathy, associated with characteristic villous atrophy. Measured by a noninvasive method, the ratio of caffeine metabolites excreted in the urine after a caffeine challenge had previously been shown to be indicative of the total xanthine oxidase activity of the patient. METHODS: The study involved 22 children with gluten-challenged celiac disease, exhibiting subtotal villous atrophy in specimens from the third intestinal biopsy in accordance with ESPGHAN criteria. Ten of the patients displayed overt clinical symptoms (active form), whereas 12 had no symptoms (silent form). Urinary caffeine metabolites were determined by high-pressure liquid chromatography. The total in vivo xanthine oxidase activity was expressed as the caffeine metabolite index. RESULTS: In patients with active celiac disease the xanthine oxidase activity index was considerably higher, whereas in those with silent disease it was significantly lower than the control value. A significant negative correlation was shown between the index indicative of xanthine oxidase activity and the serum iron level of the patients. CONCLUSIONS: Activation of xanthine oxidase may play a role in the pathogenesis of active celiac disease with definite malabsorption, gastrointestinal symptoms, and anemia. The caffeine test reflects the difference in the pathogenetic mechanism leading to the mucosal lesion and clinical symptoms of active and silent forms of celiac disease.
Drug interactions with newer antidepressants: role of human cytochromes P450. Greenblatt DJ, von Moltke LL, Harmatz JS, Shader RI. J Clin Psychiatry 1998: 59(Suppl 15); 19-27. Selective serotonin reuptake inhibitors and related antidepressant compounds have the secondary pharmacologic property of inhibiting the activity of human cytochrome P450 enzymes responsible for the oxidative metabolism of many drugs. A number of clinically important pharmacokinetic drug interactions are a consequence of these cytochrome inhibiting effects. This review evaluates the clinical implications of the metabolic profiles of the newer antidepressants, the relative activities of various new antidepressants as inhibitors of human cytochrome P450, and the various in vivo and in vitro methodologies that can be used for identification and quantification of drug interactions.
Cytochrome P450 Involvement in the biotransformation of cisapride and racemic norcisapride in vitro: differential activity of individual human CYP3A isoforms. Pearce R, RR G, GL K, JS. L. Drug Metab Dispos 2001: 29(12); 1548-1554. Identification of the human cytochrome P450 (P450) enzymes involved in the metabolism of cisapride and racemic norcisapride [(+/-)-norcisapride] was investigated at 0.1 and 1 microM, concentrations that span the mean plasma C(max) for cisapride. Formation of norcisapride (Nor), 3-fluoro-4-hydroxycisapride (3F), and 4-fluoro-2-hydroxycisapride (4F) from cisapride and an uncharacterized metabolite (UNK) from (+/-)-norcisapride in human liver microsomes (HLMs) were consistent with Michaelis-Menten kinetics for a single enzyme (K(m), 6.0, 14.3, 13.9, and 107 microM; V(max), 1350, 696, 568, and 25 pmol/mg of protein, respectively). HLMs converted cisapride to Nor at rates that were at least 3 orders of magnitude greater than those observed for (+/-)-norcisapride conversion to UNK. The sample-to-sample variation in the rates of Nor, 3F, 4F, and UNK formation correlated strongly (r(2) > 0.796) with CYP3A4/5 activity in a panel of HLMs (n = 7) and was markedly reduced by ketoconazole, a potent CYP3A inhibitor. Ketoconazole virtually eliminated (+/-)-norcisapride conversion to UNK (94 +/- 0.5%). Studies with 10 cDNA-expressed enzymes revealed that CYP3A4 catalyzed the formation of Nor and 4F at rates >100 times those of non-CYP3A enzymes and >100- and 50-fold higher than CYP3A5 and CYP3A7, respectively. CYP3A4 was the only P450 capable of UNK formation. Therefore, CYP3A4 is the principal P450 enzyme responsible for the conversion of cisapride to Nor, 3F, and 4F and of (+/-)-norcisapride to UNK. Compared with cisapride, factors related to CYP3A4-mediated (+/-)-norcisapride metabolism (e.g., ontogeny of drug-metabolizing enzymes, inhibition, and induction) should be clinically unimportant due to the apparent lack of dependence on cytochromes P450 for elimination.
Cytochrome P450 2D6 variants in a Caucasian population: allele frequencies and phenotypic consequences. Sachse C, Brockmoller J, Bauer S, Roots I. American Journal of Human Genetics 1997: 60(2); 284-95. [Medline] Cytochrome P450 2D6 (CYP2D6) metabolizes many important drugs. CYP2D6 activity ranges from complete deficiency to ultrafast metabolism, depending on at least 16 different known alleles. Their frequencies were determined in 589 unrelated German volunteers and correlated with enzyme activity measured by phenotyping with dextromethorphan or debrisoquine. For genotyping, nested PCR-RFLP tests from a PCR amplificate of the entire CYP2D6 gene were developed. The frequency of the CYP2D6*1 allele coding for extensive metabolizer (EM) phenotype was.364. The alleles coding for slightly (CYP2D6*2) or moderately (*9 and *10) reduced activity (intermediate metabolizer phenotype [IM]) showed frequencies of.324.018, and.015, respectively. By use of novel PCR tests for discrimination, CYP2D6 gene duplication alleles were found with frequencies of.005 (*1x2).013 (*2x2), and.001 (*4x2). Frequencies of alleles with complete deficiency (poor metabolizer phenotype [PM]) were.207 (*4).020 (*3 and *5).009 (*6), and.001 (*7, *15, and *16). The defective CYP2D6 alleles *8, *11, *12, *13, and *14 were not found. All 41 PMs (7.0%) in this sample were explained by five mutations detected by four PCR-RFLP tests, which may suffice, together with the gene duplication test, for clinical prediction of CYP2D6 capacity. Three novel variants of known CYP2D6 alleles were discovered: *1C (T1957C), *2B (additional C2558T), and *4E (additional C2938T). Analysis of variance showed significant differences in enzymatic activity measured by the dextromethorphan metabolic ratio (MR) between carriers of EM/PM (mean MR =.006) and IM/PM (mean MR =.014) alleles and between carriers of one (mean MR =.009) and two (mean MR =.003) functional alleles. The results of this study provide a solid basis for prediction of CYP2D6 capacity, as required in drug research and routine drug treatment.
Developmental expression of CYP2C and CYP2C-dependent activities in the human liver: in-vivo/in-vitro correlation and inducibility. Treluyer JM, Gueret G, Cheron G, Sonnier M, Cresteil T. Pharmacogenetics 1997: 7(6); 441-52. Experiments were performed in vivo and in vitro to date the onset of hepatic CYP2C isoforms and CYP2C-dependent activities during the perinatal period in humans. Proteins were not detected by immunoblotting in fetal livers and developed in the first few weeks after birth, irrespective of the gestational age at birth. Similarly, the hydroxylation of tolbutamide, a marker for CYP2C9 was undetected in fetal liver microsomes and rose in the first month after birth. In adult liver preparations, the hydroxylation of diazepam correlated well with the CYP3 A content of microsomes (r = 0.858, p < 0.01) and with the 6 beta hydroxylation of testosterone (r = 0.830, p < 0.005), whereas demethylation was related to the bulk of CYP2C proteins (r = 0.865, p < 0.005). In fetal liver microsomes, hydroxylation and demethylation activities accounted for less than 5% of the adult activities and both increased immediately after birth to reach adult activities at 1 year of age. When diazepam was given for sedative purpose in neonates and infants, the in-vivo urinary excretion of desmethyl diazepam, temazepam and oxazepam was extremely low in 1-2 day newborns (less than 5 nmol metabolites excreted in 24 h per kg body weight) and developed in the first week after birth. In newborns, barbiturates and to a lesser extent steroids, acted as inducers of CYP2C isoforms and increased tolbutamide hydroxylation, diazepam demethylation and diazepam hydroxylation by 2 to 10-fold. The surge of CYP2C proteins was caused by an accumulation of RNAs occurring in the first week after birth. The hepatic content in CYP2C8, 2C9 and 2C18 RNA displayed the same profile of evolution, which suggested a coregulation of their synthesis during the neonatal period. Taken together, these biochemical and clinical data enable dating of the onset of CYP2C proteins to the first weeks after birth, which is of considerable clinical importance in pediatric pharmacology.
Medical Genetics: 2. The Diagnostic Approach to the Child with Dysmorphic Signs. Hunter AGW. Canadian Medical Association 2002: 166((4)); 367-372. [Medline] [Abstract] [Full text] [PDF] Dysmorphology is the branch of clinical genetics in which clinicians and researchers study and attempt to interpret the patterns of human growth and structural defects. Reaching an accurate diagnosis for children with dysmorphic signs is important to their families, because it makes available all the accumulated knowledge about the relevant condition any may provide the family withthe opportunity for interaction with patient or parent support groupw. I show in this review that reaching a diagnosis in dysmorphology involves an apparoach that is not fundamentally different from that of other medical discipolines. Cytogenetic and molecular techniques continue to improve our ability to make precise syndrome diagnoses; however, these tests are expensive and should be used selectively.
Evidence-based disease management. Ellrodt G, Cook DJ, Lee J, Cho M, Hunt D, Weingarten S. Jama 1997: 278(20); 1687-92. [Medline] Disease management is an approach to patient care that emphasizes coordinated, comprehensive care along the continuum of disease and across health care delivery systems. Evidence-based medicine is an approach to practice and teaching that integrates pathophysiological rationale, caregiver experience, and patient preferences with valid and current clinical research evidence. Using diabetes mellitus as an example, we describe the importance of evidence-based medicine to the development of disease management programs. We present a method for developing and implementing evidence-based clinical guidelines, clinical pathways, and algorithms and describe the creation of systems to measure and report processes and outcomes that could drive quality improvement in diabetes care. Multidisciplinary teams are ideally suited to develop, lead, and implement evidence-based disease management programs, since they play an essential role in the preventive, diagnostic, and therapeutic decisions for patients with diabetes throughout the course of their disease.
Reducing medication errors: potential benefits of bolus thrombolytic agents. Richards CF, Cannon CP. Acad Emerg Med 2000: 7(11); 1285-9. [Medline] A recent Institute of Medicine report highlighted the high incidence of medical errors in clinical practice, and the important fact that errors are associated with increased mortality. The administration of thrombolytic therapy for acute myocardial infarction is a particularly high-risk situation for emergency physicians. The combination of extreme time pressure with a narrow "therapeutic window" increases the potential for adverse outcomes due to dosing errors. Numerous trials have found that the dose of thrombolytic therapy is closely related to outcomes, with too low a dose associated with lower rates of infarct-related artery patency and higher doses associated with increased bleeding and intracranial hemorrhage. In the GUSTO-I trial, 13.5% of patients treated with streptokinase and 11.5% of patients treated with tissue plasminogen activator (t-PA) had a medication error (i.e., incorrect dose or infusion length). Most importantly, 30-day mortality was significantly higher in patients with medication errors: for t-PA dosing errors mortality was 7.7% vs 5.5% for patients who received the correct t-PA dose (p < 0.001), with similar findings for streptokinase. More recent data from the InTIME2 trial and other studies showed that use of a bolus thrombolytic agent reduced the rate of medication errors. Thus, use of the simpler bolus thrombolytic agents may reduce emergency department medication errors, and thus improve overall clinical outcome.
Chronic asthma and chiropractic spinal manipulation: a randomized clinical trial. Nielsen N, Bronfort G, Bendix T, Madsen F, Weeke B. Clin Exp Allergy 1995: 25(1); 80-8. [Medline] The purpose of this randomized patient- and observer-blinded cross-over trial was to evaluate the efficacy of chiropractic treatment in the management of chronic asthma when combined with pharmaceutical maintenance therapy. The trial was conducted at the National University Hospital's Out-patient Clinic in Copenhagen, Denmark. Thirty-one patients aged 18-44 years participated, all suffering from chronic asthma controlled by bronchodilators and/or inhaled steroids. Patients, or who had received chiropractic treatment for asthma within the last 5 years, who received oral steroids and immunotherapy, were not eligible. Patients were randomized to receive either active chiropractic spinal manipulative treatment or sham chiropractic spinal manipulative treatment twice weekly for 4 weeks, and then crossed over to the alternative treatment for another 4 weeks. Both phases were preceded and followed by a 2-week period without chiropractic treatment. The main outcome measurements were forced expiratory volume in the first second (FEV1), forced vital capacity (FVC), daily use of inhaled bronchodilators, patient-rated asthma severity and non-specific bronchial reactivity (n-BR). Using the cross-over analysis, no clinically important or statistically significant differences were found between the active and sham chiropractic interventions on any of the main or secondary outcome measures. Objective lung function did not change during the study, but over the course of the study, non-specific bronchial hyperreactivity (n-BR) improved by 36% (P = 0.01) and patient-rated asthma severity decreased by 34% (P = 0.0002) compared with the baseline values.(ABSTRACT TRUNCATED AT 250 WORDS)
Recent Advances: Complementary medicine. Vickers A. BMJ 2000: 321; 683-686. [Medline] [Full text] [PDF]
Results of the national cooperative inner-city asthma study (NCICAS) environmental intervention to reduce cockroach allergen exposure in inner-city homes. Gergen P, Mortimer K, Eggleston P, Rosenstreich D, Mitchell H, Ownby D, Kattan M, Baker D. J Allergy Clin Immunol 1999: 103(3); 501-6. ABSTRACT: BACKGROUND: Cockroach allergen is important in asthma. Practical methods to reduce exposure are needed. OBJECTIVE: We sought to evaluate the effectiveness of house cleaning and professional extermination on lowering cockroach antigen levels in inner-city dwellings. METHODS: As part of the National Cooperative Inner-City Asthma Study intervention, 265 of 331 families with asthmatic children who had positive skin test responses to cockroach allergen consented to a professional home extermination with 2 applications of a cockroach insecticide (Abamectin, Avert) combined with directed education on cockroach allergen removal. On a random subset of 48 homes undergoing cockroach extermination in the intervention group, Bla g 1 was measured in settled dust from the kitchen, bedroom, and TV/living room. The first sample was collected 1 week before extermination, with additional samples after the exterminations at approximately 2, 6, and 12 months after the first sample. Self-reported problems with cockroaches were collected at baseline and after 12 months of follow-up in both the intervention and control group. RESULTS: The geometric mean kitchen level of Bla g 1 decreased at 2 months (33.6 U/g) relative to preextermination levels (68.7 U/g, P <.05). The percent of kitchens with over 8 U/g of Bla g 1 followed a similar pattern, but only the decrease from preextermination to 6-month levels was significant (86.8% vs 64.3%, P <.05). By the 12-month visit, the allergen burden had returned to or exceeded baseline levels. Except for an increase in the bedroom at 2 months (8.9 U/g vs 11.1 U/g, P <.05), no other significant change was seen. Only about 50% of the families followed the cleaning instructions; no greater effect was found in these homes. Self-reported problems with cockroaches showed no difference between the intervention and control group after 1 year of follow-up. CONCLUSIONS: Despite a significant, but short-lived, decrease the cockroach allergen burden remained well above levels previously found to be clinically significant.
Assessment of independent effect of olanzapine and risperidone on risk of diabetes among patients with schizophrenia: population based nested case-control study. Koro CE, Fedder DO, L'Italien GJ, Weiss SS, Magder LS, Kreyenbuhl J, Revicki DA, Buchanan RW. Bmj 2002: 325(7358); 243. [Medline] OBJECTIVE: To quantify the association between olanzapine and diabetes. DESIGN: Population based nested case-control study. SETTING: United Kingdom based General Practice Research Database comprising 3.5 million patients followed between 1987 and 2000. PARTICIPANTS: 19 637 patients who had been diagnosed as having and treated for schizophrenia. 451 incident cases of diabetes were matched with 2696 controls. MAIN OUTCOME MEASURES: Diagnosis and treatment of diabetes. RESULTS: Patients taking olanzapine had a significantly increased risk of developing diabetes than non-users of antipsychotics (odds ratio 5.8, 95% confidence interval 2.0 to 16.7) and those taking conventional antipsychotics (4.2, 1.5 to 12.2). Patients taking risperidone had a non-significant increased risk of developing diabetes than non-users of antipsychotics (2.2, 0.9 to 5.2) and those taking conventional antipsychotics (1.6, 0.7 to 3.8). CONCLUSION: Olanzapine is associated with a clinically important and significant increased risk of diabetes.
Sample size determination under an exponential model in the presence of a confounder and type I censoring. Lui KJ. Control Clin Trials 1992: 13(6); 446-58. In controlled clinical trials, random assignment of treatments to individuals is usually used to eliminate the effects of confounding variables. When there is censorship in data, however, confounding effects may not be automatically removed solely by random assignment of treatments to individuals under the exponential model. Therefore, it is important to incorporate the confounding effect into the sample size calculation even after randomization of treatments to individuals. In this paper, the discussion is restricted only to the situation where there are two comparison groups and one single Bernoulli confounding variable. Based on an exponential covariate model, an explicit sample size formula considering the confounding effect has been derived for the design of trials with type I censoring, in which an end time is fixed in advance and all responses occurring after that time are censored. The resulting sample size formula can also be applied to nonrandomized clinical trials. Finally, to provide insight into the influence of different factors on sample size calculation, a discussion on the effects of treatments, the confounder, the length of follow-up times for studied individuals, and the joint distribution of the treatment and the confounder has been included.
Criteria for evaluating evidence on public health interventions. Rychetnik L, Frommer M, Hawe P, Sheill A. J. Epidemiol. Community Health 2002: 56(2); 119-127. Public health interventions tend to be complex, programmatic, and context dependent. The evidence for their effectiveness must be sufficiently comprehensive to encompass that complexity. This paper asks whether and to what extent evaluative research on public health interventions can be adequately appraised by applying well established criteria for judging the quality of evidence in clinical practice. It is adduced that these criteria are useful in evaluating some aspects of evidence. However, there are other important aspects of evidence on public health interventions that are not covered by the established criteria. The evaluation of evidence must distinguish between the fidelity of the evaluation process in detecting the success or failure of an intervention, and the success or failure of the intervention itself. Moreover, if an intervention is unsuccessful, the evidence should help to determine whether the intervention was inherently faulty (that is, failure of intervention concept or theory), or just badly delivered (failure of implementation). Furthermore, proper interpretation of the evidence depends upon the availability of descriptive information on the intervention and its context, so that the transferability of the evidence can be determined. Study design alone is an inadequate marker of evidence quality in public health intervention evaluation.
Does Elimination of Placebo Responders in a Placebo Run-In Increase the Treatment in Randomized Cllinical Trials? A Meta-Analytic Evaluation. Lee S, Walker JR, Jakul L, Sexton K. Depress Anxiety 2004: 19(1); 10-9. [Abstract] The use of a placebo run-in phase, in which placebo responders are withdrawn from a study before random assignment to treatment condition, has been criticized as favoring the active treatment in clinical trials. We compared the effect size of randomized, placebo-controlled clinical trials (in the treatment of depression with selective serotonin reuptake inhibitors [SSRIs]) that include a placebo run-in phase with those that do not, using a meta-analytic approach. This study differed from earlier meta-analytic studies in that it considered only SSRIs and included only studies using continuous measures of depression, allowing for a more refined assessment of effect size. An extensive literature search identified 43 datasets published between 1980 and 2000 comparing placebo with SSRI and using a continuous measure of depression (usually the Hamilton Depression Rating Scale). We included only studies of at least 6 weeks' duration focusing on treatment for primary acute major depression in adults 18-65 years of age. Studies focusing on depression in specific medical illnesses were not included. Analysis of efficacy was based on 3,047 subjects treated with an SSRI antidepressant and 3,740 subjects treated with a placebo. There was no statistically significant difference in effect size between the clinical trials that had a placebo run-in phase followed by withdrawal of placebo responders and those trials that did not. Despite the lack of a statistically significant difference between studies of withdrawing early placebo responders and those not using this procedure, this approach is likely to continue to be used widely because it produces large absolute effect sizes. It is recommended that future studies clearly describe these procedures and report the number of subjects dropped from the study for early placebo response and other reasons. Depression and Anxiety 19:10-19, 2004. 2004 Wiley-Liss, Inc.
Is HIV Infection a Co-factor for Cervical Squamous Cell Neoplasia? Mandelblatt JS, Kanetsky P, Eggert L, Gold K. Cancer Epidemilogy, Biomarkers & Prevention 1999: 8(1); 97-106. [Abstract] [Full text] [PDF] The objective of this study was to test the hypothesis that HIV interacts with human papilloma virus (HPV) to increase the odds of cervical neoplasia. The study design was a meta-analysis using data pooled from published sources. Studies published between January 1986 and March 1998 were eligible for inclusion if they included data on neoplasia (cytology-based), HIV (defined by laboratory and/or standard clinical criteria), and HPV (assessed by PCR, Southern blot, dot-blot hybridization, or cytology of an otherwise well designed study) among nonpregnant women. Blinded data abstraction was performed independently by the investigators. There were 15 studies that were eligible and presented data in a format that could be abstracted for analysis. Data were pooled using a Mantel-Haenszel summary odds ratio (OR); generalized estimation regression equations were used to examine independent effects of HIV and HPV. Overall, based on the Mantel-Haenszel ORs, there was a strong overall association between HPV and neoplasia [OR, 8.1; 95% confidence interval (CI), 6.5-10.1]. Stratifying by HIV status, HIV-positive women had higher odds of disease (OR, 8.8; 95% CI, 6.3-12.5) than HIV-negative women (OR, 5.0; 95% CI, 3.7-6.8). In the regression model, there was an interaction between HPV and HIV (P = 0.01); immunosuppression also tended to predict neoplasia (P = 0.058). HIV seems to be a cofactor in the association between HPV and cervical neoplasia; this effect may vary by level of immune function. These speculations are biologically plausible. Additional data from large, well designed studies are needed to confirm these hypotheses.
Analysis of nonsteroidal antiinflammatory drugs in meconium and its relation to persistent pulmonary hypertension of the newborn. Alano MA, Ngougmna E, Ostrea EM, Jr., Konduri GG. Pediatrics 2001: 107(3); 519-23. OBJECTIVE: The objective of this study was to detect fetal exposure to nonsteroidal antiinflammatory drugs (NSAIDs) by meconium analysis and to determine the relationship between fetal exposure to NSAIDs and the development of persistent pulmonary hypertension of the newborn (PPHN). METHODS: In a case-control study of the inborn and outborn nurseries of a large urban medical center, meconium was collected from 101 newborn infants (40 with the diagnosis of PPHN based on clinical or echocardiographic criteria and 61 randomly selected, healthy, term infants [control]) and analyzed for NSAIDs (ibuprofen, naproxen, indomethacin, and aspirin) by gas chromatography/mass spectrometry. The risk of developing PPHN was determined in infants who were exposed antenatally to NSAID. RESULTS: Infants with PPHN (n = 40) had a mean gestation of 38.9 weeks and birth weight of 3524 g, which were similar to the those of the control group (n = 61). However, the incidence of low Apgar scores (</=6) at 1 minute and 5 minutes was significantly higher in the PPHN group than in the control group. The diagnoses associated with PPHN were primary PPHN (25%), meconium aspiration syndrome (35%), respiratory distress syndrome (20%), low Apgar score/asphyxia (12.5%), and pneumonia/sepsis (8%). Mean duration of ventilator support for the PPHN group was 11 days. Nitric oxide (NO) was given to 19 infants (47.5%) for a mean duration of 25.4 hours. Fourteen of the 19 infants who were treated with NO (74%) required extracorporeal membrane oxygenation, and 2 died. The overall incidence of positive NSAID in meconium in the study population (n = 101) was 49.5%: 22.8% were positive for ibuprofen, 18.8% for naproxen, 7.9% for indomethacin, and 43.6% for aspirin. There was poor agreement (Cohen's kappa = 0.09) between maternal history of NSAID use and NSAID detection in meconium. PPHN was significantly associated with 1) the presence of at least 1 NSAID in meconium (odds ratio [OR] = 21.47; 95% confidence interval [CI] = 7.12-64.71) or 2) the presence in meconium of aspirin (OR = 8.09; 95% CI = 3.27-20.10), ibuprofen (OR = 12.89; 95% CI 3.93-42.32), or naproxen (OR = 3.31; 95% CI = 1.17-9.33). By logistic regression analysis, low Apgar scores at 1 and 5 minutes and the antenatal exposure to aspirin, naproxen, and ibuprofen were significantly associated with PPHN and treatment with inhaled NO or extracorporeal membrane oxygenation. CONCLUSION: We confirm by meconium analysis the results of previous studies that demonstrated that the use of NSAIDs during pregnancy, particularly aspirin, ibuprofen, and naproxen, is high; is grossly underestimated by maternal history; and is significantly associated with PPHN. Thus, the easy access to over-the-counter NSAIDs of pregnant women should be reevaluated, and the potential dangers of these drugs to the newborn infant should be more effectively promoted.
Does how you do depend on how you think you'll do? A systematic review of the evidence for a relation between patients' recovery expectations and health outcomes. Mondloch MV, Cole DC, Frank JW. Cmaj 2001: 165(2); 174-9. BACKGROUND: Most clinicians would probably agree that what patients think will happen can influence what does happen over the clinical course. Yet despite useful narrative reviews on expectancy of therapeutic gain and the mechanisms by which expectancy can affect health outcomes, we were unable to locate a systematic review of the predictive relation between patients' recovery expectations and their health outcomes. METHODS: We searched MEDLINE for English-language articles published from 1966 to June 1998 with a title or abstract containing at least 1 of the medical subject headings (MeSH) "self-assessment," "self-concept" or "attitude to health," or the MeSH subheading "psychology," and at least 1 word from each of 3 sets: "patient" and similar words; a form of "expectation," "belief" or "prediction"; and a form of "recover," "outcome," "survival" or "improve." Relevant articles contained original research data, measured patients' recovery expectations, independently measured a subsequent health outcome and analyzed the relation between expectations and outcomes. We assessed internal validity using quality criteria for prognostic studies based on 6 categories (case definition; patient selection; extent of follow-up; objective outcome criteria; measurement and reporting of recovery expectations; and analysis). RESULTS: A total of 1243 titles or abstracts were identified through the computer search, and 93 full-text articles were retrieved. Forty-one of these articles met the relevance criteria, along with 4 additional articles identified through other means. Agreement beyond chance on quality assessments of 18 randomly selected articles was high (kappa = 0.87, p = 0.001). Sixteen of the 45 articles provided moderate-quality evidence and included a range of clinical conditions and study designs; 15 of the 16 showed that positive expectations were associated with better health outcomes. The strength of the relation depended on the clinical conditions and the measured used. INTERPRETATION: Consistency across the studies reviewed and the evidence they provided support the need for clinicians to clarify patients' expectations and to assist them in having appropriate expectations of recovery. The understanding of the nature, extent and clinical implications of the relation between expectations and outcomes could be enhanced by more conceptually driven and methodologically sound research, including evaluations of intervention effectiveness.
Synaptopodin expression in idiopathic nephrotic syndrome of childhood. Srivastava T, Garola RE, Whiting JM, Alon US. Kidney Int 2001: 59(1); 118-25. [Medline] BACKGROUND: Synaptopodin is a proline-rich protein intimately associated with actin microfilaments present in the podocytes' foot processes. We investigated for synaptopodin expression in children with idiopathic nephrotic syndrome (INS), including minimal change disease (MCD), diffuse mesangial hypercellularity (DMH), and focal segmental glomerulosclerosis (FSGS); in children with congenital nephrotic syndrome of the Finnish type (CNF); and in normal kidney tissue. In particular, we examined whether an association exists between synaptopodin expression in podocyte cells and the response to steroids in INS, and whether synaptopodin expression can predict FSGS upon the initial kidney biopsy in children who progress from MCD or DMH to FSGS. METHODS: Immunohistochemistry was performed for synaptopodin expression on renal tissues from MCD (N = 18), DMH (N = 7), FSGS (N = 13), CNF (N = 9), and normal children (N = 7). Synaptopodin expression in nonsclerosed glomeruli was quantitated by computerized image analysis on the Optimastrade mark software for both luminance (L) and percentage of glomerular area (A). RESULTS: Synaptopodin expression was absent in areas of sclerosis. In nonsclerosed glomeruli, synaptopodin was significantly less expressed in all groups of INS and in CNF compared with normal (P < 0.0001 for both L and A, in each MCD, DMH, FSGS, and CNF). In INS, synaptopodin expression decreased in order from MCD to DMH to FSGS, reaching statistical significance between MCD and FSGS (P = 0.001 for L and P = 0.05 for A). Greater synaptopodin expression in podocytes was associated with a significantly better response to steroid therapy (P < 0.05 for both L and A). On the other hand, the expression of synaptopodin did not predict progression of MCD or DMH to FSGS. CONCLUSION: We conclude that measurement of synaptopodin has the potential to be used as a marker to study the alteration in podocyte cell and response to therapy in INS.
Optimization of cytochrome p450 2D6 (CYP2D6) phenotype assignment using a genotyping algorithm based on allele frequency data. Gaedigk R, Gotschall R, Forbes N, Simon S, Kearns G, Leeder J. Pharmacogenetics 1999: 9(6); 669-682. [Medline] ABSTRACT: Cytochrome P4502D6 (CYP2D6) is a highly polymorphic gene locus with > 50 variant alleles which lead to a wide range in enzymatic activity. So called poor metabolizers are carriers of any two non-functional alleles of the CYP2D6 gene. CYP2D6 genotyping is cumbersome and the question of how much genotyping is necessary for an accurate phenotype prediction is still debated. The goal of this study was to determine the optimum amount of genotyping required to accurately predict the phenotype at a reasonable cost in a white North American population. To address this issue, we designed a polymerase chain reaction (PCR)/restriction fragment length polymorphism-based genotyping strategy to detect 'key' mutations linked to extensive metabolizer or poor metabolizer associated alleles in combination with extra-long PCR (XL-PCR). All mutations with the exception of gene deletions and duplications are detectable by simple restriction digestion analysis and agarose gel electrophoresis. In addition, we utilized a genotyping algorithm based on our own and published allele frequency data and phenotype analysis to calculate the probability of a correct genotype (and thus, phenotype) assignment. As little as one XL-PCR reaction followed by a maximum of six reamplification reactions allows an accurate prediction of an individual's genotype to 99.15%. As few as four reamplification reactions identify 97.9% of poor metabolizer individuals. We evaluated our model in 208 white North Americans by testing for the presence of 'key' mutations linked to CYP2D6*2, *3, *4, *6, *7, *8, *9, *10, *11, *12, *15, *17 and *18 alleles and the *5, *13 and *16 gene deletions. For all individuals, the correct phenotype has been predicted. Discordant phenotype assignment occurred in only two individuals which subsequently was attributed to CYP2D6 inhibition by concomitant drug therapy.
Creatinine excretion rates for renal clearance studies. Hellerstein S, Simon SD, Berenbom M, Erwin P, Nickell E. Pediatr Nephrol 2001: 16(8); 637-43. [Medline] [Abstract] A total of 637 timed-urine collections for creatinine excretion rates obtained from 295 children over 14 years have been analyzed. The children ranged in age from 2.8 to 21.7 years at the time of the clearance study. The data analyzed included only one study from a child during any 6-month period. The objective is to provide data defining the expected range of creatinine excretion for renal clearance studies. One hundred forty-two studies were conducted on children not pretreated with cimetidine and 495 on those pretreated with cimetidine. Analysis showed that pretreatment with cimetidine for creatinine clearance studies does not alter creatinine excretion rates (P=0.080; 95% CI -0.03 to 1.61). Creatinine excretion rates in urine collections obtained at home (roughly 24-h collections) were compared with 2-h supervised collections in the Children's Kidney Center. The supervised urine collections resulted in creatinine excretion rates 1.38 mg/kg/24 h greater than home collections (P=0.001; 95% CI 0.76-2.00). Using regression equations for creatinine excretion rate with age, tables have been prepared showing the expected rate of creatinine excretion for renal clearance studies in children 3-21 years of age.
Pulmonary reactivity to vanadium pentoxide following subchronic inhalation exposure in a non-human primate animal model. Knecht EA, Moorman WJ, Clark JC, Hull RD, Biagini RE, Lynch DW, Boyle TJ, Simon SD. J Appl Toxicol 1992: 12(6); 427-34. [Medline] An experimental study was conducted to evaluate changes in pulmonary reactivity resulting from repeated vanadium pentoxide (V2O5) dust inhalation. The study assessed pulmonary reactivity to V2O5 through the use of provocation challenges, and compared V2O5 reactivity before and after subchronic V2O5 exposure. A total of 24 adult, male cynomolgus monkeys (Macaca fascicularis) were exposed by inhalation for 6 h per day, 5 days per week, for 26 weeks. Two V2O5-exposed groups (n = 8 each) received equal weekly V2O5 exposures (concentration x time) with different exposure profiles. One V2O5-exposed group received 0.1 mg V2O5 m-3 on Mondays, Wednesday and Fridays, with a twice-weekly peak exposure of 1.1 mg V2O5 m-3 on Tuesdays and Thursdays, and was included to investigate the influence of an exposure regimen with peaks on the development of pulmonary hyper-reactivity. The other V2O5-exposed group received a constant daily concentration of 0.5 mg V2O5 m-3. A control group (n = 8) received filtered, conditioned air. Pre-exposure challenges with V2O5 produced a concentration-dependent impairment in pulmonary function, characterized by airway obstructive changes (increased resistance and decreased flow). Analysis of respiratory cells recovered from the lung by bronchoalveolar lavage demonstrated that airway obstruction was accompanied by a significant influx of inflammatory cells into the lung. Subchronic V2O5 inhalation did not produce an increase in V2O5 reactivity in comparison to the control group, and cytological, immunological and skin test results indicate the absence of allergic sensitization. Instead, a trend toward decreased pulmonary reactivity was found following subchronic V2O5 inhalation.(ABSTRACT TRUNCATED AT 250 WORDS).
Differences in diagnostic criteria for gastric carcinoma between Japanese and western pathologists. Schlemper RJ, Itabashi M, Kato Y, Lewin KJ, Riddell RH, Shimoda T, Sipponen P, Stolte M, Watanabe H, Takahashi H, Fujita R. Lancet 1997: 349(9067); 1725-9. [Medline] [Abstract] BACKGROUND: There have been many studies on gastric carcinoma in populations with contrasting cancer risks. We aimed to find out whether the criteria for the histological diagnosis of early gastric carcinoma were comparable in Western countries and Japan. METHODS: Eight pathologists from Japan, North America, and Europe individually reviewed 35 microscope slides: 17 gastric biopsy samples and 18 endoscopic mucosal resections taken from 17 Japanese patients with lesions ranging from early gastric cancer to adenoma, dysplasia, and reactive atypia. The pathologists were given a list of pathological criteria and a form on which they were asked to indicate the criteria on which they based each diagnosis. FINDINGS: For seven slides most Western pathologists diagnosed low-grade adenoma/dysplasia, whereas the Japanese diagnosed definite carcinoma in four slides, suspected carcinoma in one, and adenoma in only two. Of 12 slides with high-grade adenoma/dysplasia according to most Western pathologists the Japanese gave the diagnosis of definite carcinoma in 11 and suspected in one. Of six slides showing high-grade adenoma/dysplasia with suspected carcinoma according to most Western pathologists the Japanese diagnosed definite carcinoma in all. There were no major differences in the diagnoses of three slides showing reactive epithelium and seven slides with clearly invasive carcinoma. When the opinion of the majority of the pathologists was taken as the final diagnosis there was agreement between Western and japanese in 11 of the 35 slides (kappa coefficient 0.15 [95% CI 0.01-0.29]). Presence of invasion was the most important diagnostic criterion for most Western pathologists whereas for the Japanese nuclear features and glandular structures were more important. INTERPRETATION: In Japan, gastric carcinoma is diagnosed on nuclear and structural criteria even when invasion is absent according to the Western viewpoint. This diagnostic practice results in almost no discrepancy between the diagnosis of a superficial biopsy sample and that of the final resection specimen. This may also contribute to the relatively high incidence and good prognosis of gastric carcinoma in Japan when compared with Western countries.
Measurement of markers of tobacco smoking in patients with coronary heart disease. Archbold GP, Cupples ME, McKnight A, Linton T. Ann Clin Biochem 1995: 32 (Pt 2); 201-7. [Medline] 591 patients with a history of coronary heart disease had one or more biochemical markers of tobacco smoking measured. 26% were self reported smokers and a further 4% were apparent 'smoking deceivers'. The urinary nicotine metabolite concentration is an excellent marker for tobacco smoking; breath CO would be a suitable alternative for busy clinics. Half the patients were subjected to regular advice on risk factor management but there was no evidence that this contributed effectively to smoking cessation. Overall smoking cessation rate was poor.
Can the clinical examination diagnose left-sided heart failure in adults? Badgett RG, Lucey CR, Mulrow CD. Jama 1997: 277(21); 1712-9. [Medline] We systematically reviewed the literature to ascertain how well clinicians determine the probability and type of left-sided heart failure in their patients. Left-sided heart failure is characterized by decreased left ventricular ejection fraction or increased filling pressure. The type of heart failure determines optimal treatment. Systolic dysfunction exists when ejection fraction is reduced. Diastolic dysfunction is presumed to be present when filling pressure is increased with a normal ejection fraction and without another explanatory diagnosis. Many findings are associated with heart failure, and wide variation exists in clinicians' ability to detect these findings. The best findings for detecting increased filling pressure are jugular venous distention and radiographic redistribution. The best findings for detecting systolic dysfunction are abnormal apical impulse, radiographic cardiomegaly, and q waves or left bundle branch block on an electrocardiogram. Diastolic dysfunction is especially difficult to diagnose, but is associated with an elevated blood pressure during heart failure.
The diagnostic accuracy of cervico-vaginal fetal fibronectin in predicting preterm delivery: an overview. Chien PF, Khan KS, Ogston S, Owen P. British Journal of Obstetrics and Gynaecology 1997: 104(4); 436-44. [Medline] OBJECTIVE: To determine the accuracy with which cervico-vaginal fetal fibronectin predicts preterm delivery using systematic quantitative overview of the available literature. DESIGN: Online searching of MEDLINE database (1966 to April 1996), scanning of bibliography of known primary and review articles and review of recent journal issues. Study selection, assessment of study quality and data extraction were performed in duplicate under masked conditions. Likelihood ratios were generated in subgroups of symptomatic and asymptomatic pregnant women by pooling data from different studies. An LR of > 10 or < 0.1 indicated conclusive changes in the pretest probability of preterm delivery while an LR of 5-10 or 0.2-0.1 indicated only moderate changes. PARTICIPANTS: Seven hundred and twenty-three symptomatic women with threatened preterm labour included in nine studies and 847 asymptomatic women (635 low risk and 212 high risk) included in six studies selected for meta-analyses. MAIN OUTCOME MEASURES: Likelihood ratios for positive and negative test results using delivery at < 37 and < 34 weeks of gestation, and within one week of testing as outcome measures. RESULTS: In symptomatic women a positive test predicted delivery < 37 weeks of gestation with a pooled likelihood ratio (LR) of 4.6 (95% CI 3.5-6.1) while a negative test had a pooled LR of 0.5 (95% CI 0.4-0.6). For delivery < 34 weeks of gestation, the pooled LR was 2.6 (95% CI 1.8-3.7) for a positive test and 0.2 (95% CI 0.1-0.5) for a negative test. For delivery within one week of testing, the pooled LR was 5.0 (95% CI 3.8-6.4) for a positive test and 0.2 (95% CI 0.1-0.4) for a negative test. In asymptomatic women at low risk of delivery < 37 weeks of gestation the pooled LR was 3.2 (95% CI 2.2-4.8) for a positive test and 0.8 (95% CI 0.7-0.9) for a negative test. In high risk asymptomatic women using delivery < 37 weeks of gestation as an outcome measure the pooled LR was 2.0 (95% CI 1.5-2.6) for a positive test and 0.4 (95% CI 0.2-0.8) for a negative test. For delivery < 34 weeks of gestation in high risk, asymptomatic women the pooled LR was 2.4 (95% CI 1.8-3.2) for a positive test and 0.6 (95% CI 0.4-0.9) for a negative test. CONCLUSION: The presence of fetal fibronectin in cervico-vaginal mucus has limited accuracy in predicting preterm delivery as the likelihood ratios for positive and negative test results generated only minimal to moderate changes in the pretest probability of preterm birth.
Use of Metabolic Markers To Identify Overweight Individuals Who Are Insulin Resistant. McLaughlin T, Abbasi F, Cheal K, Chu J, Lamendola C, Reaven G. Ann Intern Med 2003: 139(10); 802 -809. [Abstract] [Full text] [PDF] Background: Insulin resistance is more common in overweight individuals and is associated with increased risk for type 2 diabetes mellitus and cardiovascular disease. Given the current epidemic of obesity and the fact that lifestyle interventions, such as weight loss and exercise, decrease insulin resistance, a relatively simple means to identify overweight individuals who are insulin resistant would be clinically useful. Objective: To evaluate the ability of metabolic markers associated with insulin resistance and increased risk for cardiovascular disease to identify the subset of overweight individuals who are insulin resistant. Design: Cross-sectional study. Setting: General clinical research center. Patients: 258 nondiabetic, overweight volunteers. Measurements: Body mass index; fasting glucose, insulin, lipid and lipoprotein concentrations; and insulin-mediated glucose disposal as quantified by the steady-state plasma glucose concentration during the insulin suppression test. Overweight was defined as body mass index of 25 kg/m2 or greater, and insulin resistance was defined as being in the top tertile of steady-state plasma glucose concentrations. Receiver-operating characteristic curve analysis was used to identify the best markers of insulin resistance; optimal cut-points were identified and analyzed for predictive power. Results: Plasma triglyceride concentration, ratio of triglyceride to high-density lipoprotein cholesterol concentrations, and insulin concentration were the most useful metabolic markers in identifying insulin-resistant individuals. The optimal cut-points were 1.47 mmol/L (130 mg/dL) for triglyceride, 1.8 in SI units (3.0 in traditional units) for the triglyceride-high-density lipoprotein cholesterol ratio, and 109 pmol/L for insulin. Respective sensitivity and specifity for these cut-points were 67%, 64%, and 57% and 71%, 68%, and 85%. Their ability to identify insulin-resistant individuals was similar to the ability of the criteria proposed by the Adult Treatment Panel III to diagnose the metabolic syndrome (sensitivity, 52%, and specificity, 85%). Conclusions: Three relatively simple metabolic markers can help identify overweight individuals who are sufficiently insulin resistant to be at increased risk for various adverse outcomes. In the absence of a standardized insulin assay, we suggest that the most practical approach to identify overweight individuals who are insulin resistant is to use the cut-points for either triglyceride concentration or the triglyceride-high-density lipoprotein cholesterol concentration ratio.
Simpson's paradox and calculation of number needed to treat from meta-analysis. Cates CJ. BMC Med Res Methodol 2002: 2(1); 1. [Medline] [Abstract] [Full text] [PDF] BACKGROUND: Calculation of numbers needed to treat (NNT) is more complex from meta-analysis than from single trials. Treating the data as if it all came from one trial may lead to misleading results when the trial arms are imbalanced. DISCUSSION: An example is shown from a published Cochrane review in which the benefit of nursing intervention for smoking cessation is shown by formal meta-analysis of the individual trial results. However if these patients were added together as if they all came from one trial the direction of the effect appears to be reversed (due to Simpson's paradox).Whilst NNT from meta-analysis can be calculated from pooled Risk Differences, this is unlikely to be a stable method unless the event rates in the control groups are very similar. Since in practice event rates vary considerably, the use a relative measure, such as Odds Ratio or Relative Risk is advocated. These can be applied to different levels of baseline risk to generate a risk specific NNT for the treatment. SUMMARY: The method used to calculate NNT from meta-analysis should be clearly stated, and adding the patients from separate trials as if they all came from one trial should be avoided.
Reporting of measures of accuracy in systematic reviews of diagnostic literature. Honest H, Khan KS. BMC Health Serv Res 2002: 2(1); 4. [Medline] [Abstract] [Full text] [PDF] BACKGROUND: There are a variety of ways in which accuracy of clinical tests can be summarised in systematic reviews. Variation in reporting of summary measures has only been assessed in a small survey restricted to meta-analyses of screening studies found in a single database. Therefore, we performed this study to assess the measures of accuracy used for reporting results of primary studies as well as their meta-analysis in systematic reviews of test accuracy studies. METHODS: Relevant reviews on test accuracy were selected from the Database of Abstracts of Reviews of Effectiveness (1994-2000), which electronically searches seven bibliographic databases and manually searches key resources. The structured abstracts of these reviews were screened and information on accuracy measures was extracted from the full texts of 90 relevant reviews, 60 of which used meta-analysis. RESULTS: Sensitivity or specificity was used for reporting the results of primary studies in 65/90 (72%) reviews, predictive values in 26/90 (28%), and likelihood ratios in 20/90 (22%). For meta-analysis, pooled sensitivity or specificity was used in 35/60 (58%) reviews, pooled predictive values in 11/60 (18%), pooled likelihood ratios in 13/60 (22%), and pooled diagnostic odds ratio in 5/60 (8%). Summary ROC was used in 44/60 (73%) of the meta-analyses. There were no significant differences in measures of test accuracy among reviews published earlier (1994-97) and those published later (1998-2000). CONCLUSIONS: There is considerable variation in ways of reporting and summarising results of test accuracy studies in systematic reviews. There is a need for consensus about the best ways of reporting results of test accuracy studies in reviews.
Pooling data for number needed to treat: no problems for apples. Moore RA, Gavaghan DJ, Edwards JE, Wiffen P, McQuay HJ. BMC Med Res Methodol 2002: 2(1); 2. [Medline] [Abstract] [Full text] [PDF] OBJECTIVE: To consider the problem of the calculation of number needed to treat (NNT) derived from risk difference, odds ratio, and raw pooled events shown to give different results using data from a review of nursing interventions for smoking cessation. DISCUSSION: A review of nursing interventions for smoking cessation from the Cochrane Library provided different values for NNT depending on how NNTs were calculated. The Cochrane review was evaluated for clinical heterogeneity using L'Abbe plot and subsequent analysis by secondary and primary care settings.Three studies in primary care had low (4%) baseline quit rates, and nursing interventions were without effect. Seven trials in hospital settings with patients after cardiac surgery, or heart attack, or even with cancer, had high baseline quit rates (25%). Nursing intervention to stop smoking in the hospital setting was effective, with an NNT of 14 (95% confidence interval 9 to 26). The assumptions involved in using risk difference and odds ratio scales for calculating NNTs are discussed. SUMMARY: Clinical common sense and concentration on raw data helps to detect clinical heterogeneity. Once robust statistical tests have told us that an intervention works, we then need to know how well it works. The number needed to treat or harm is just one way of showing that, and when used sensibly can be a useful tool.
Transthoracic needle aspiration biopsy for the diagnosis of localised pulmonary lesions: a meta-analysis. Lacasse Y, Wong E, Guyatt GH, Cook DJ. Thorax 1999: 54(10); 884-93. [Medline] [Abstract] [Full text] BACKGROUND: Persisting controversy surrounds the use of transthoracic needle aspiration biopsy (TNAB) stemming from its uncertain diagnostic accuracy. A systematic review and meta-analysis was therefore conducted to evaluate the accuracy of TNAB for the diagnosis of solitary or multiple localised pulmonary lesions. METHODS: Searches for English literature papers in Index Medicus (1963-1965) and Medline (1966-1996) were performed and the bibliographies of the retrieved articles were systematically reviewed. Articles evaluating the accuracy of TNAB in series of consecutive patients presenting with solitary or multiple pulmonary lesions were considered. Only papers in which >/=90% of patients were given a final diagnosis according to an appropriate reference standard were included in the meta-analysis. RESULTS: A total of 48 studies were included and five meta-analyses were conducted according to four diagnostic thresholds. From the pooled sensitivity and specificity corresponding to each diagnostic threshold, associated likelihood ratios (LRs) were derived for malignant disease as follows: (1) malignant versus all other categories, LR = 72; (2) malignant or suspicious versus all others, LR = 49; (3) suspicious versus all categories but malignant, LR = 15; (4) benign versus all others, LR = 0.07; and (5) specific benign diagnosis versus all others, LR = 0.005. Differences in methodological quality of the studies, needle types, or whether a cytopathologist participated in the procedure failed to explain the heterogeneity of the results found in almost every meta-analysis. Given a 50% probability of malignancy prior to the TNAB, post-test probabilities of malignancy upon receiving the results would be malignant, 99%; suspicious, 94%; non-specific benign, 7%; and benign with a specific diagnosis, 0.6%. CONCLUSIONS: Given the intermediate pre-test probabilities that would probably lead to performing TNAB, findings of "malignant" or of a specific diagnosis of a benign condition provide definitive results. Findings of "suspicious" markedly increase the probability of malignancy, and "benign" markedly decreases it but may not be considered definitive.
Tests for Helicobacter pylori infection: a critical appraisal from primary care. Roberts AP, Childs SM, Rubin G, de Wit NJ. Fam Pract 2000: 17 Suppl 2; S12-20. [Medline] [Abstract] [PDF] BACKGROUND: Testing of patients for Helicobacter pylori infection is common in primary care settings. The accuracy of such tests has been assessed and critical appraisal of this literature can inform the clinical management of patients suspected of being infected with H. pylori METHODS: Literature evaluating the accuracy of diagnostic tests for H.pylori infection was sought as part of a systematic review of literature concerning the management of patients infected with H.pylori Studies were appraised and estimates of sensitivity and specificity were extracted. Positive and negative likelihood ratios (LRs) were calculated and the implications for post-test probabilities are reported. RESULTS: The sensitivity, specificity, LR+ and LR- for H.pylori infection tests ar: [(13)C]urea breath test (UBT), 96.5, 96, 24 and 0.04; [(14)C]UBT, 97.5, 95.5, 21 and 0.03; serology, 91, 89.5, 8 and 0.11; near patient tests, 77, 74, 3 and 0. 31; and meta-analysis of serology, 85, 79, 4 and 0.19. The range of pre-test probabilities of H.pylori infection in which the diagnostic tests were useful, estimated from primary studies, wer: [(13)C]UBT, 20-90%; [(14)C]UBT, 20-99%; serology, 30-80%; and near patient tests, 50-60%. CONCLUSIONS: Tests for H.pylori infection are useful in primary care when the pre-test probability of infection is neither too high nor too low. This indicates that the tests may not be useful for screening purposes but may help with differential diagnosis. Outside moderate pre-test probability ranges, the chances of a result being false is high, and such patients should either receive eradication without prior testing (if the probability of infection is sufficiently high) or the test result should be reconfirmed. When the pre-test probability falls below approximately 20%, a positive test result is unreliable. If the pre-test probability is above approximately 80%, a negative test result is unreliable. Clinical selection of patients needing testing should be used to limit testing to individuals with pre-test probabilities within these ranges. The choice of diagnostic H.pylori test should be influenced by the H.pylori infection rate in the population being tested and the test characteristics. Recommendations for the use of tests, especially near patient tests, should be reconsidered. This critical appraisal supports the recommendations of the European Society for Primary Care Gastroenterology guidelines, arrived at by consensus, for testing for H.pylori infection in primary care.
Conducting systematic reviews of diagnostic studies: didactic guidelines. Deville WL, Buntinx F, Bouter LM, Montori VM, De Vet HC, Van Der Windt DA, Bezemer P. BMC Med Res Methodol 2002: 2(1); 9. [Medline] [Abstract] [Full text] [PDF] BACKGROUND: Although guidelines for critical appraisal of diagnostic research and meta-analyses have already been published, these may be difficult to understand for clinical researchers or do not provide enough detailed information. METHODS: Development of guidelines based on a systematic review of the evidence in reports of systematic searches of the literature for diagnostic research, of methodological criteria to evaluate diagnostic research, of methods for statistical pooling of data on diagnostic accuracy, and of methods for exploring heterogeneity. RESULTS: Guidelines for conducting diagnostic systematic reviews are presented in a stepwise fashion and are followed by comments providing further information. Examples are given using the results of two systematic reviews on the accuracy of the urine dipstick in the diagnosis of urinary tract infections, and on the accuracy of the straight-leg-raising test in the diagnosis of intervertebral disc hernia.
Receiver Operating Characteristic (ROC) Literature Research. Zou KH, Harvard Medical School. Accessed on 2003-09-08. Background (early papers and textbooks); Overviews / reviews; Design of roc studies / bias; Curve-fitting; Indices of diagnostic accuracy; Statistical inference; Imperfect gold standard; Meta-analysis; Continuous measurements; Generalizations (multiple diseases, localization); Medical decision making (optimal cutoff, utility); Logistic regression and classification trees; Mammography; Sample size considerations; Measurement error models; Unclassified. splweb.bwh.harvard.edu:8000/pages/ppl/zou/roc.html
Percentage of free prostate-specific antigen in sera predicts aggressiveness of prostate cancer a decade before diagnosis. Carter HB, Partin AW, Luderer AA, Metter EJ, Landis P, Chan DW, Fozard JL, Pearson JD. Urology 1997: 49(3); 379-84. [Medline] OBJECTIVES: To evaluate serial measurements of free and total prostate-specific antigen (PSA) as a predictor of prostate cancer aggressiveness. METHODS: Twenty men diagnosed with adenocarcinoma of the prostate in the pre-PSA era had serum PSA measurements made on multiple stored frozen sera samples available for up to 18 years prior to diagnosis. Subjects were categorized as having aggressive cancer (n = 12) based on the presence of clinical Stage T3, or nodal or bone metastases (N+, M+), or pathologic positive-margin disease, or a Gleason score of 7 or greater; nonaggressive cancer (n = 8) was identified by the absence of these criteria. RESULTS: There was no statistically significant difference in free PSA levels among men with aggressive and nonaggressive prostate cancers from 0 to 15 years before diagnosis. Total PSA levels were significantly different between the groups by 5 years before diagnosis (P = 0.04). At a time when total PSA levels were not different between groups (10 years before diagnosis), there was a statistically significant difference in the percentage of free PSA between aggressive and nonaggressive cancers (P = 0.008). Among 14 men who had sera available for analysis at 10 years before diagnosis, all 8 men with aggressive cancers had a percent free PSA of 0.14 or less; this compares with only 2 of 6 men (33%) with nonaggressive cancer. CONCLUSIONS: These data suggest that the percentage of free PSA in sera is predictive of tumor behavior at a time when total PSA levels provide no information on tumor aggressiveness. Evaluation of the percentage of free serum PSA may be helpful in making the decision between expectant management and treatment for those men who are diagnosed with early prostate cancers by PSA testing.
Is screening for breast cancer with mammography justifiable? [Commentary: a compilation of comments RE:Assessment of nationwide cancer-screening programmes]. Gotzsche P, Olsen O. The Lancet 2000: 355(9198); 129-34. [Medline] BACKGROUND: A 1999 study found no decrease in breast-cancer mortality in Sweden, where screening has been recommended since 1985. We therefore reviewed the methodological quality of the mammography trials and an influential Swedish meta-analysis, and did a meta-analysis ourselves. METHODS: We searched the Cochrane Library for trials and asked the investigators for further details. Meta-analyses were done with Review Manager (version 4.0). FINDINGS: Baseline imbalances were shown for six of the eight identified trials, and inconsistencies in the number of women randomised were found in four. The two adequately randomised trials found no effect of screening on breast-cancer mortality (pooled relative risk 1.04 [95% CI 0.84-1.27]) or on total mortality (0.99 [0.94-1.05]). The pooled relative risk for breast-cancer mortality for the other trials was 0.75 (0.67-0.83), which was significantly different (p=0.005) from that for the unbiased trials. The Swedish meta-analysis showed a decrease in breast-cancer mortality but also an increase in total mortality (1.06 [1.04-1.08]); this increase disappeared after adjustment for an imbalance in age. INTERPRETATION: Screening for breast cancer with mammography is unjustified. If the Swedish trials are judged to be unbiased, the data show that for every 1000 women screened biennially throughout 12 years, one breast-cancer death is avoided whereas the total number of deaths is increased by six. If the Swedish trials (apart from the Malmo trial) are judged to be biased, there is no reliable evidence that screening decreases breast-cancer mortality.
A systematic review of the effects of screening for colorectal cancer using the faecal occult blood test, hemoccult. Towler B, Irwig L, Glasziou P, Kewenter J, Weller D, Silagy C. Bmj 1998: 317(7158); 559-65. [Medline] [Abstract] [Full text] [PDF] OBJECTIVE: To review effectiveness of screening for colorectal cancer with faecal occult blood test, Hemoccult, and to consider benefits and harms of screening. DESIGN: Systematic review of trials of Hemoccult screening, with meta-analysis of results from the randomised controlled trials. SUBJECTS: Four randomised controlled trials and two non-randomised trials of about 330 000 and 113 000 people respectively aged >=40 years in five countries. MAIN OUTCOME MEASURES: Meta-analysis of effects of screening on mortality from colorectal cancer. RESULTS: Quality of trial design was generally high, and screening resulted in a favourable shift in the stage distribution of colorectal cancers in the screening groups. Meta-analysis of mortality results from the four randomised controlled trials showed that those allocated to screening had a reduction in mortality from colorectal cancer of 16% (relative risk 0.84 (95% confidence interval 0.77 to 0.93)). When adjusted for attendance for screening, this reduction was 23% (relative risk 0.77 (0.57 to 0.89)) for people actually screened. If a biennial Hemoccult screening programme were offered to 10 000 people and about two thirds attended for at least one Hemoccult test, 8.5 (3.6 to 13.5) deaths from colorectal cancer would be prevented over a period of 10 years. CONCLUSION: Although benefits of screening are likely to outweigh harms for populations at high risk of colorectal cancer, more information is needed about the harmful effects of screening, the community's responses to screening, and costs of screening for different healthcare systems before widespread screening can be recommended.
Screening for breast cancer with mammography. Olsen O, Gotzsche PC. Cochrane Database Syst Rev 2001: (4); CD001877. [Medline] BACKGROUND: Mammographic screening for breast cancer is controversial, as reflected in greatly varying national policies. OBJECTIVES: To assess the effect of screening for breast cancer with mammography on mortality and morbidity. SEARCH STRATEGY: MEDLINE (16 May 2000), The Cochrane Breast Cancer Group's trial register (24 Jan 2000) and reference lists. Letters, abstracts and unpublished trials. Authors were contacted. SELECTION CRITERIA: Randomised trials comparing mammographic screening with no mammographic screening. DATA COLLECTION AND ANALYSIS: Data were extracted by both authors independently. MAIN RESULTS: Seven completed and eligible trials involving half a million women were identified. The two best trials provided medium-quality data and, when combined, yield a relative risk for overall mortality of 1.00 (95% CI 0.96-1.05) after 13 years. However, the trials are underpowered for all-cause mortality, and confidence intervals include a possible worthwhile effect as well as a possible detrimental effect. If data from all eligible trials (excluding flawed studies) are considered then the relative risk for overall mortality after 13 years is 1.01 (95% CI 0.99-1.03). The best trials failed to show a significant reduction in breast cancer mortality with a relative risk of 0.97 (95% CI 0.82-1.14). If data from all eligible trials (excluding flawed studies) are considered then the relative risk for breast cancer mortality after 13 years is 0.80 (95% CI 0.71-0.89). However, breast cancer mortality is considered to be an unreliable outcome and biased in favour of screening. Flaws are due to differential exclusion of women with breast cancer from analysis and differential misclassification of cause of death. REVIEWER'S CONCLUSIONS: The currently available reliable evidence does not show a survival benefit of mass screening for breast cancer (and the evidence is inconclusive for breast cancer mortality). Women, clinicians and policy makers should consider these findings carefully when they decide whether or not to attend or support screening programs.
Specificity of MR angiography as a confirmatory test of carotid artery stenosis. Kallmes DF, Omary RA, Dix JE, Evans AJ, Hillman BJ. AJNR Am J Neuroradiol 1996: 17(8); 1501-6. [Medline] [Abstract] [PDF] PURPOSE: To estimate from available literature the specificity (true-negative rate) of MR angiography for detecting severe carotid artery stenoses when applied as a confirmatory test after screening with duplex Doppler sonography. METHODS: We reviewed the pertinent MR angiographic literature published between 1990 and 1994 and recalculated the specificity of MR angiography after deleting from the database results for normal vessels and for vessels with mild and moderate stenoses, since the study of these vessels is not germane to an exploration of the utility of MR angiography as a confirmatory test. RESULTS: Seventeen articles provided data for our analysis. We divided vessels into four categories on the basis of data supplied within each article. Seven of the articles provided data that could be configured to match the categories used in the North American Symptomatic Carotid Endarterectomy Trial (NASCET). In one study, the criterion of severe stenosis was more than 70% construction, but the moderate category was limited to stenoses of 50% to 69%. The remaining series defined severe stenoses as more than 80% (four series), more than 75% (two series), or more than 60% (three series) constriction. The stated specificity of MR angiography ranged from 64% to 100%. Before revision, 15 of 17 articles had stated specificity values above 75%. Our recalculated values ranged from 18% to 100%. Only seven of 17 studies would have had MR angiographic specificity of greater than 75%. Nine of 17 articles would have had specificities of less than 60%. For all articles specifically identifying vessels with false-positive findings at sonography, the specificity of MR angiography was 16%. CONCLUSION: To base specificity values for MR angiography as a confirmatory test of carotid artery stenosis on studies that include nondiseased vessels incurs spectrum bias. The actual specificity for MR angiography as a confirmatory test remains unknown, but it is lower than that reported in the literature.
Practice parameter: the management of acute gastroenteritis in young children. American Academy of Pediatrics, Provisional Committee on Quality Improvement, Subcommittee on Acute Gastroenteritis. Unknown A. Pediatrics 1996: 97(3); 424-35. This practice parameter formulates recommendations for health care providers about the management of acute diarrhea in children ages 1 month to 5 years. It was developed through a comprehensive search and analysis of the medical literature. Expert consensus opinion was used to enhance or formulate recommentations where data were insufficient. The Provisional Committee on Quality Improvement of the American Academy of Pediatrics (AAP) selected a subcommittee composed of pediatricians with expertise in the fields of gastroenterology, infectious diseases, pediatric practice, and epidemiology to develop the parameter. The subcommittee, the Provisional Committee on Quality Improvement, a review panel of practitioners, and other groups of experts within and outside the AAP reviewed and revised the parameter. Three specific management issues were considered: (1) methods of rehydration, (2) refeeding after rehydration, and (3) the use of antidiarrheal agents. Main outcomes considered were success or failure of rehydration, resolution of diarrhea, and adverse effects from various treatment options. A comprehensive bibliography of literature on gastroenteritis and diarrhea was compiled and reduced to articles amenable to analysis. Oral rehydration therapy was studied in depth; inconsistency in the outcomes measured in the studies interfered with meta-analysis but allowed for formulation of strong conclusions. Oral rehydration was found to be as effective as intravenous therapy in rehydrating children with mild to moderate dehydration and is the therapy of first choice in these patients. Refeeding was supported by enough comparable studies to permit a valid meta-analysis. Early refeeding with milk or food after rehydration does not prolong diarrhea; there is evidence that it may reduce the duration of diarrhea by approximately half a day and is recommended to restore nutritional balance as soon as possible. Data on antidiarrheal agents were not sufficient to demonstrate efficacy; therefore, the routine use of antidiarrheal agents is not recommended, because many of these agents have potentially serious adverse effects in infants and young children. This pracrtice parameter is not indended as a sole source of guidance in the treatment of acute gastroenteritis in children. It is designed to assist pediatricians by providing an analytic framework for the evaluation and treatment of this condition. It is not intended to replace clinical judgment or to establish a protocol for all patients with this condition. It rarely will provide the only appropriate approach to the problem. A technical report describing the analyses used to prepare this parameter and a patient education brochure are available through the Publications Department of the AAP.
META-ANALYSIS Dose-specific Meta-Analysis and Sensitivity Analysis of the Relation between Alcohol Consumption and Lung Cancer Risk. Korte JE, Brennan P, Henley SJ, Boffetta P. Am. J of Epidemiology 2002: 155(6); 496-506. Alcohol drinking increases the risk of several types of cancer, but studies of the relation between alcohol and lung cancer risk are complicated by smoking. The authors carried out meta-analyses for four study designs and conducted sensitivity analyses to assess the results. Pooled smoking-unadjusted relative risks (RRs) for brewery workers and alcoholics were 1.17 (95% confidence interval (CI): 0.99, 1.39) and 1.99 (95% CI: 1.66, 2.39), respectively, relative to population rates. For cohort and case-control studies, the authors conducted dose-specific meta-analyses for ethanol consumption of 1–499, 500–999, 1,000–1,999, and 2,000 g/month, relative to nondrinking. Smoking-adjusted RRs for ascending dose groups in cohort studies were 0.98 (95% CI: 0.79, 1.21), 0.92 (95% CI: 0.81, 1.04), 1.04 (95% CI: 0.88, 1.22), and 1.53 (95% CI: 1.04, 2.25), respectively. Smoking-adjusted odds ratios for ascending groups in case-control studies were 0.63 (95% CI: 0.51, 0.78), 1.30 (95% CI: 0.98, 1.70), 1.13 (95% CI: 0.46, 2.75), and 1.86 (95% CI: 1.39, 2.49), respectively. Elevated odds ratios were seen for hospital-based case-control studies but not for population-based case-control studies. Sensitivity analyses indicated that smoking explained the elevated RRs in studies of alcoholics and that strong misclassification of smoking status could produce an elevated smoking-adjusted RR in cohort and case-control studies. Overall, evidence for a smoking-adjusted association between alcohol and lung cancer risk is limited to very high consumption groups in cohort and hospital-based case-control studies. At lower levels, any associations observed appear to be explained by confounding.
Case-control study, meta-analysis, and bouillabaisse: putting the calcium antagonist scare into context. Messerli FH. Ann Intern Med 1995: 123(11); 888-9. [Medline] [Full text]
Comparison of evidence of treatment effects in randomized and nonrandomized studies. Ioannidis JP, Haidich AB, Pappa M, Pantazis N, Kokori SI, Tektonidou MG, Contopoulos-Ioannidis DG, Lau J. Jama 2001: 286(7); 821-30. CONTEXT: There is substantial debate about whether the results of nonrandomized studies are consistent with the results of randomized controlled trials on the same topic. OBJECTIVES: To compare results of randomized and nonrandomized studies that evaluated medical interventions and to examine characteristics that may explain discrepancies between randomized and nonrandomized studies. DATA SOURCES: MEDLINE (1966-March 2000), the Cochrane Library (Issue 3, 2000), and major journals were searched. STUDY SELECTION: Forty-five diverse topics were identified for which both randomized trials (n = 240) and nonrandomized studies (n = 168) had been performed and had been considered in meta-analyses of binary outcomes. DATA EXTRACTION: Data on events per patient in each study arm and design and characteristics of each study considered in each meta-analysis were extracted and synthesized separately for randomized and nonrandomized studies. DATA SYNTHESIS: Very good correlation was observed between the summary odds ratios of randomized and nonrandomized studies (r = 0.75; P<.001); however, nonrandomized studies tended to show larger treatment effects (28 vs 11; P =.009). Between-study heterogeneity was frequent among randomized trials alone (23%) and very frequent among nonrandomized studies alone (41%). The summary results of the 2 types of designs differed beyond chance in 7 cases (16%). Discrepancies beyond chance were less common when only prospective studies were considered (8%). Occasional differences in sample size and timing of publication were also noted between discrepant randomized and nonrandomized studies. In 28 cases (62%), the natural logarithm of the odds ratio differed by at least 50%, and in 15 cases (33%), the odds ratio varied at least 2-fold between nonrandomized studies and randomized trials. CONCLUSIONS: Despite good correlation between randomized trials and nonrandomized studies-in particular, prospective studies-discrepancies beyond chance do occur and differences in estimated magnitude of treatment effect are very common.
Empirical evidence of bias dimensions of methodological quality associated with estimates of treatment effects in controlled trials. Schulz K, Chalmers I, Hayes R, Altman D. JAMA 1995: 273(5); 408-12. [Medline] ABSTRACT: OBJECTIVE--To determine if inadequate approaches to randomized controlled trial design and execution are associated with evidence of bias in estimating treatment effects. DESIGN--An observational study in which we assessed the methodological quality of 250 controlled trials from 33 meta-analyses and then analyzed, using multiple logistic regression models, the associations between those assessments and estimated treatment effects. DATA SOURCES--Meta-analyses from the Cochrane Pregnancy and Childbirth Database. MAIN OUTCOME MEASURES--The associations between estimates of treatment effects and inadequate allocation concealment, exclusions after randomization, and lack of double-blinding. RESULTS--Compared with trials in which authors reported adequately concealed treatment allocation, trials in which concealment was either inadequate or unclear (did not report or incompletely reported a concealment approach) yielded larger estimates of treatment effects (P <.001). Odds ratios were exaggerated by 41% for inadequately concealed trials and by 30% for unclearly concealed trials (adjusted for other aspects of quality). Trials in which participants had been excluded after randomization did not yield larger estimates of effects, but that lack of association may be due to incomplete reporting. Trials that were not double-blind also yielded larger estimates of effects (P =.01), with odds ratios being exaggerated by 17%. CONCLUSIONS--This study provides empirical evidence that inadequate methodological approaches in controlled trials, particularly those representing poor allocation concealment, are associated with bias. Readers of trial reports should be wary of these pitfalls, and investigators must improve their design, execution, and reporting of trials.
Influence of maternal age at delivery and birth order on risk of type 1 diabetes in childhood: prospective population based family study. Bart's-Oxford Family Study Group. Bingley PJ, Douek IF, Rogers CA, Gale EA. British Medical Journal 2000: 321(7258); 420-4. [Medline] [Abstract] [Full text] [PDF] OBJECTIVES: To examine the influence of parental age at delivery and birth order on subsequent risk of childhood diabetes. DESIGN: Prospective population based family study. SETTING: Area formerly administered by the Oxford Regional Health Authority. Participants: 1375 families in which one child or more had diabetes. Of 3221 offspring, 1431 had diabetes (median age at diagnosis 10.5 years, range 0.4-28.5) and 1790 remained non-diabetic at a median age of 16. 1 years. MAIN OUTCOME MEASURES: Disease free survival and hazard ratios for the development of type 1 diabetes in all offspring, assessed by Cox proportional hazard regression. Results: Maternal age at delivery was strongly related to risk of type 1 diabetes in the offspring; risk increased by 25% (95% confidence interval 17% to 34%) for each five year band of maternal age, so that maternal age at delivery of 45 years or more was associated with a relative risk of 3.11 (2.07 to 4.66) compared with a maternal age of less than 20 years. Paternal age was also associated with a 9% (3% to 16%) increase for each five year increase in paternal age. The relative risk of diabetes, adjusted for parental age at delivery and sex of offspring, decreased with increasing birth order; the overall effect was a 15% risk reduction (10% to 21%) per child born. CONCLUSIONS: A strong association was found between increasing maternal age at delivery and risk of diabetes in the child. Risk was highest in firstborn children and decreased progressively with higher birth order. The fetal environment seems to have a strong influence on risk of type 1 diabetes in the child. The increase in maternal age at delivery in the United Kingdom over the past two decades could partly account for the increase in incidence of childhood diabetes over this period.
Evaluation of the performance and clinical impact of a rapid intraoperative parathyroid hormone assay in conjunction with preoperative imaging and concise parathyroidectomy. Johnson LR, Doherty G, Lairmore T, Moley JF, Brunt LM, Koenig J, Scott MG. Clin Chem 2001: 47(5); 919-25. [Medline] [Abstract] [Full text] [PDF] BACKGROUND: (99m)Tc-sestamibi scans and rapid, intraoperative intact parathyroid hormone (PTH) assays allow preoperative identification of diseased glands and intraoperative confirmation of diseased gland removal, respectively. Use of these two new technologies may facilitate simpler, more concise surgery, shorter hospital stays, and decreased costs for frozen-section analysis. One major drawback to this new strategy has been the high cost of rapid point-of-care PTH assays. METHODS: We performed rapid PTH assays with the DPC Turbo PTH assay on the DPC IMMULITE automated analyzer. The number of intraoperative frozen sections, type of anesthesia, surgical approach, length of hospital stay, and pre- and postoperative calcium values were compared between a group of 49 patients undergoing parathyroidectomy where the intraoperative PTH assay was used in conjunction with preoperative imaging, and a historical control group of 55 patients before the use of these two technologies in our institution. RESULTS: Comparison of the Turbo PTH assay to the standard IMMULITE PTH assay gave the following: y = 1.08 x - 4.36 (r = 0.97; n = 48). For the 49 patients, the median turnaround time for each intraoperative PTH determination was 19 min (range, 14-40 min). The median decrease in PTH values from baseline was 88% (range, 33-99%). Thirty-seven patients required two PTH determinations, 7 required three, 4 had four, and 1 required five determinations. The average laboratory cost for the rapid intraoperative PTH assays was < $100 per patient (range, $55 to $113). Compared with the control group, the experimental group had significantly fewer frozen sections (1.4 vs 2.5; P < 0.0001), shorter hospital stays (17 discharged on the day of surgery vs none discharged on the day of surgery; P < 0.0001), greater use of local anesthesia (33% vs 0%; P < 0.001), and more unilateral, rather than bilateral neck explorations (65% vs 0%; P < 0.001). CONCLUSIONS: The combination of intraoperative Turbo PTH assay and preoperative (99m)Tc-sestamibi scans can lead to significant decreases in laboratory and surgical pathology costs, hospital stays, and exposure to general anesthesia by facilitating concise parathyroidectomy surgery.
Comparison of stratification and adaptive methods for treatment allocation in an acute stroke clinical trial. Weir CJ, Lees KR. Stat Med 2003: 22(5); 705-26. [Medline] Achieving balance on prognostic factors between treatment groups in a clinical trial is important to ensure that any observed treatment effect may be attributed to the treatment itself. Improving the balance on prognostic factors also potentially increases the statistical power attained in a trial. Substantial imbalances may occur by chance if simple randomization is used. Allocation of the treatment according to stratified random blocks based on clinical features is the conventional approach to obtain treatment groups that are as similar as possible. An alternative approach, known as minimization (or more generally as adaptive stratification), has also been proposed. We assessed the feasibility of adaptive stratification in the context of a clinical trial of insulin to control plasma glucose level following acute stroke. We determined suitable settings for the parameters in the adaptive stratification procedure by simulation studies. Specifically, we assessed: the optimal probability for allocating a patient to the preferred (leading to least imbalance on prognostic factors) treatment group; the number of variables that could be incorporated in the adaptive stratification algorithm; the weighting that should be given to each variable; and whether interactions between variables should be included. We then compared the statistical power, across a range of simulated treatment effects, between trials where treatments were allocated by stratified random blocks and by adaptive stratification. Finally, we considered the importance of the method of analysis in realizing the gain in power which may potentially be achieved by allocating treatments using stratified random blocks or adaptive stratification.
Randomized, Controlled Trials, Observational Studies, and the Hierarchy of Research Designs. Concato J, Shah N, Horwitz RI. The New England Journal of Medicine 2000: 342(25); 1887-1892. [Medline] [Abstract] [Full text] [PDF] BACKGROUND: In the hierarchy of research designs, the results of randomized, controlled trials are considered to be evidence of the highest grade, whereas observational studies are viewed as having less validity because they reportedly overestimate treatment effects. We used published meta-analyses to identify randomized clinical trials and observational studies that examined the same clinical topics. We then compared the results of the original reports according to the type of research design. METHODS: A search of the Medline data base for articles published in five major medical journals from 1991 to 1995 identified meta-analyses of randomized, controlled trials and meta-analyses of either cohort or case-control studies that assessed the same intervention. For each of five topics, summary estimates and 95 percent confidence intervals were calculated on the basis of data from the individual randomized, controlled trials and the individual observational studies. RESULTS: For the five clinical topics and 99 reports evaluated, the average results of the observational studies were remarkably similar to those of the randomized, controlled trials. For example, analysis of 13 randomized, controlled trials of the effectiveness of bacille Calmette-Guerin vaccine in preventing active tuberculosis yielded a relative risk of 0.49 (95 percent confidence interval, 0.34 to 0.70) among vaccinated patients, as compared with an odds ratio of 0.50 (95 percent confidence interval, 0.39 to 0.65) from 10 case-control studies. In addition, the range of the point estimates for the effect of vaccination was wider for the randomized, controlled trials (0.20 to 1.56) than for the observational studies (0.17 to 0.84). CONCLUSIONS: The results of well-designed observational studies (with either a cohort or a case-control design) do not systematically overestimate the magnitude of the effects of treatment as compared with those in randomized, controlled trials on the same topic.
Using routine data to complement and enhance the results of randomised controlled trials. Lewsey JD, Leyland AH, Murray GD, Boddy FA. Health Technol Assess 2000: 4(22); 1-55. [Medline] BACKGROUND: Randomised controlled trials (RCTs) are widely accepted as the best way to assess the outcomes and safety of medical interventions, but are sometimes not ethical, not feasible, or limited in the generalisability of their results. In such circumstances, routinely available data could help in several ways. Routine data could be used, for example, to conduct 'pseudo-trials', to estimate likely outcomes and required sample size to help design and conduct trials, or to examine whether the expected outcomes observed in an RCT will be realised in the general population. OBJECTIVES: The project was undertaken to explore how routinely assembled hospital data might complement or supplement RCTs to evaluate medical interventions: in contexts where RCTs are not feasible for defining the context and design of an RCT for assessing whether the benefits indicated by RCTs are achieved in wider clinical practice. METHODS: The project was based on the system of linked Scottish morbidity records, which cover 100% of acute hospital care episodes and statutory death records from 1981 to 1995. Three case studies were undertaken as a way of investigating the utility of these records in different applications. First, an attempt was made to analyse the link between the timing of surgery for subarachnoid haemorrhage (SAH) and subsequent outcomes (a question not easily susceptible to RCT design). A subsample was derived by excluding patients for which a diagnosis of SAH may not have been established or that may not have been admitted to a neurosurgical unit, and the data were assessed to attempt to inform the design of a trial of early versus late surgery. Transurethral prostatectomy (TURP), the second case study, has become the surgery of choice for benign prostatic hyperplasia without systematic assessment of its effectiveness and safety, and an RCT would now be considered unethical. However, there is a need to investigate long-term effects and the influence of co-morbidities on outcomes. A retrospective comparison of mortality and re-operation following either open prostatectomy (OPEN) or TURP was, therefore, undertaken. Patients for whom it was not possible to establish the initial procedure were excluded. The third case study compared coronary artery bypass grafting (CABG) with percutaneous transluminal angioplasty (PTCA) for coronary revascularisation. RCTs have been conducted in limited patient subgroups with short follow-up periods. A meta-analysis of RCTs could be augmented by routine data, which are available for large populations. This would allow assessment of subgroup effects, and outcomes over a long period. A subgroup of patients was therefore constructed for whom relevant routine data were available and who reflected the entry criteria for major RCTs, thus enabling a comparison between the results expected from this subgroup and those of the general population. RESULTS AND CONCLUSIONS: The uses of routine data in these contexts had strengths and weaknesses. The SAH study suggested a means of assessing outcomes and survival rates following haemorrhage, which could have value in informing the design of more precise trials and in evaluating changes in outcome following the introduction of new treatments such as embolisation. However, the potential of the data was not realised because their scope and content were insufficient. For example, lack of data on the time of onset of symptoms and patients' conditions at hospital admission made it difficult to establish the link between timing of surgery and the outcome, and there was insufficient information on patients' conditions at discharge to enable a comparison of outcomes. The prostatectomy study was able to address questions not answered by RCT literature because the large number of cases it included allowed exploration of subgroup effects. (ABSTRACT TRUNCATED)
Cohort studies of fat intake and the risk of breast cancer--a pooled analysis. Hunter DJ, Spiegelman D, Adami HO, Beeson L, van den Brandt PA, Folsom AR, Fraser GE, Goldbohm RA, Graham S, Howe GR, et al. N Engl J Med 1996: 334(6); 356-61. BACKGROUND. Experiments in animals, international correlation comparisons, and case-control studies support an association between dietary fat intake and the incidence of breast cancer. Most cohort studies do not corroborate the association, but they have been criticized for involving small numbers of cases, homogeneous fat intake, and measurement errors in estimates of fat intake. METHODS. We identified seven prospective studies in four countries that met specific criteria and analyzed the primary data in a standardized manner. Pooled estimates of the relation of fat intake to the risk of breast cancer were calculated, and data from study-specific validation studies were used to adjust the results for measurement error. RESULTS. Information about 4980 cases from studies including 337,819 women was available. When women in the highest quintile of energy-adjusted total fat intake were compared with women in the lowest quintile, the multivariate pooled relative risk of breast cancer was 1.05 (95 percent confidence interval, 0.94 to 1.16). Relative risks for saturated, monounsaturated, and polyunsaturated fat and for cholesterol, considered individually, were also close to unity. There was little overall association between the percentage of energy intake from fat and the risk of breast cancer, even among women whose energy intake from fat was less than 20 percent. Correcting for error in the measurement of nutrient intake did not materially alter these findings. CONCLUSIONS. We found no evidence of a positive association between total dietary fat intake and the risk of breast cancer. There was no reduction in risk even among women whose energy intake from fat was less than 20 percent of total energy intake. In the context of the Western lifestyle, lowering the total intake of fat in midlife is unlikely to reduce the risk of breast cancer substantially.
Reinventing the wheel will not make it rounder: controlled trials of homeopathy reconsidered. Walach H. J Altern Complement Med 2003: 9(1); 7-13. [Medline]
The role of prospective randomized clinical trials in pediatric surgery: state of the art? Moss RL, Henry MC, Dimmitt RA, Rangel S, Geraghty N, Skarsgard ED. J Pediatr Surg 2001: 36(8); 1182-6. [Medline] PURPOSE: This study sought to determine the role of randomized controlled trials (RCT) in the evolution of pediatric surgical practice. METHODS: The authors used a computer-assisted literature search to identify all clinical trials related to pediatric surgery published in the English-language literature from 1966 through 1999. Each article was reviewed in detail for purpose, content, conduct, and quality of the trial. The authors assessed quality with a previously validated instrument (Chalmers Qualitative Assessment). RESULTS: The authors identified 134 RCTs related to pediatric surgery over the past 33 years. This accounts for 0.17% of 80,377 articles published in the field. The areas of surgery studied were analgesia 65 (49%), antibiotics 17 (13%), extracorporeal membrane oxygenation (ECMO) 9 (7%), gastrointestinal, burns, oncology, minimally invasive surgery, vascular access, congenital anomalies, and trauma (each <5%). Only 16 (12%) trials compared 2 surgical therapies, 9 (7%) compared a medical versus a surgical therapy, and 109 (81%) compared 2 medical therapies in surgical patients. Fourteen (10%) RCTs were funded by peer-reviewed agencies. Only 17 (13%) RCTs included a biostatistician as an author or a consultant. Trial design included calculation of sample size and statistical power in 21 (16%) RCTs. Method of randomization was reported in only 51 (38%). The test statistic and observed probability value was reported in 15 (11%). CONCLUSIONS: Clinical trials are used infrequently to answer questions related to pediatric surgery. When RCTs are utilized, they often suffer from poor trial design, inadequate statistical analysis, and incomplete reporting. Pediatric surgery could benefit from increased expertise, funding, and participation in clinical trials.
Benefit of heparin in peripheral venous and arterial catheters: systematic review and meta-analysis of randomised controlled trials. Randolph AG, Cook DJ, Gonzales CA, Andrew M. British Medical Journal 1998: 316(7136); 969-75. [Medline] [Abstract] [Full text] [PDF] OBJECTIVE: To evaluate the effect of heparin on duration of catheter patency and on prevention of complications associated with use of peripheral venous and arterial catheters. DESIGN: Critical appraisal and meta-analysis of 26 randomised controlled trials that evaluated infusion of heparin intermittently or continuously. Thirteen trials of peripheral venous catheters and two of peripheral arterial catheters met criteria for inclusion. MAIN OUTCOME MEASURES: Data on the populations, interventions, outcomes, and methodological quality. RESULTS: For peripheral venous catheters locked between use flushing with 10 U/ml of heparin instead of normal saline did not reduce the incidence of catheter clotting and phlebitis or improve catheter patency. When heparin was given as a continuous infusion at 1 U/ml the risk of phlebitis decreased (relative risk 0.55; 95% confidence interval 0.39 to 0.77), the duration of patency increased, and infusion failure was reduced (0.88; 0.72 to 1.07). Heparin significantly prolonged duration of patency of radial artery catheters and decreased the risk of clot formation (0.51; 0.42 to 0.61). CONCLUSIONS: Use of intermittent heparin flushes at doses of 10 U/ml in peripheral venous catheters locked between use had no benefit over normal saline flush. Infusion of low dose heparin through a peripheral arterial catheter prolonged the duration of patency but further study is needed to establish its benefit for peripheral venous catheters.
Supplemental ascorbate in the supportive treatment of cancer: Prolongation of survival times in terminal human cancer. Cameron E, Pauling L. Proc Natl Acad Sci U S A 1976: 73(10); 3685-9. [Medline] [Full text] Ascorbic acid metabolism is associated with a number of mechanisms known to be involved in host resistance to malignant disease. Cancer patients are significantly depleted of ascorbic acid, and in our opinion this demonstrable biochemical characteristic indicates a substantially increased requirement and utilization of this substance to potentiate these various host resistance factors. The results of a clinical trial are presented in which 100 terminal cancer patients were given supplemental ascorbate as part of their routine management. Their progress is compared to that of 1000 similar patients treated identically, but who received no supplemental ascorbate. The mean survival time is more than 4.2 times as great for the ascorbate subjects (more than 210 days) as for the controls (50 days). Analysis of the survival-time curves indicates that deaths occur for about 90% of the ascorbate-treated patients at one-third the rate for the controls and that the other 10% have a much greater survival time, averaging more than 20 times that for the controls. The results clearly indicate that this simple and safe form of medication is of definite value in the treatment of patients with acvanced cancer.
Dietary factors and risk of breast cancer: combined analysis of 12 case-control studies. Howe GR, Hirohata T, Hislop TG, Iscovich JM, Yuan JM, Katsouyanni K, Lubin F, Marubini E, Modan B, Rohan T, et al. J Natl Cancer Inst 1990: 82(7); 561-9. We conducted a combined analysis of the original data to evaluate the consistency of 12 case-control studies of diet and breast cancer. Our analysis shows a consistent, statistically significant, positive association between breast cancer risk and saturated fat intake in postmenopausal women (relative risk for highest vs. lowest quintile, 1.46; P less than.0001). A consistent protective effect for a number of markers of fruit and vegetable intake was demonstrated; vitamin C intake had the most consistent and statistically significant inverse association with breast cancer risk (relative risk for highest vs. lowest quintile, 0.69; P less than.0001). If these dietary associations represent causality, the attributable risk (i.e., the percentage of breast cancers that might be prevented by dietary modification) in the North American population is estimated to be 24% for postmenopausal women and 16% for premenopausal women.
Association of funding and conclusions in randomized drug trials: a reflection of treatment effect or adverse events? Als-Nielsen B, Chen W, Gluud C, Kjaergard LL. Jama 2003: 290(7); 921-8. [Medline] CONTEXT: Previous studies indicate that industry-sponsored trials tend to draw proindustry conclusions. OBJECTIVE: To explore whether the association between funding and conclusions in randomized drug trials reflects treatment effects or adverse events. DESIGN: Observational study of 370 randomized drug trials included in meta-analyses from Cochrane reviews selected from the Cochrane Library, May 2001. From a random sample of 167 Cochrane reviews, 25 contained eligible meta-analyses (assessed a binary outcome; pooled at least 5 full-paper trials of which at least 1 reported adequate and 1 reported inadequate allocation concealment). The primary binary outcome from each meta-analysis was considered the primary outcome for all trials included in each meta-analysis. The association between funding and conclusions was analyzed by logistic regression with adjustment for treatment effect, adverse events, and additional confounding factors (methodological quality, control intervention, sample size, publication year, and place of publication). MAIN OUTCOME MEASURE: Conclusions in trials, classified into whether the experimental drug was recommended as the treatment of choice or not. RESULTS: The experimental drug was recommended as treatment of choice in 16% of trials funded by nonprofit organizations, 30% of trials not reporting funding, 35% of trials funded by both nonprofit and for-profit organizations, and 51% of trials funded by for-profit organizations (P<.001; chi2 test). Logistic regression analyses indicated that funding, treatment effect, and double blinding were the only significant predictors of conclusions. Adjusted analyses showed that trials funded by for-profit organizations were significantly more likely to recommend the experimental drug as treatment of choice (odds ratio, 5.3; 95% confidence interval, 2.0-14.4) compared with trials funded by nonprofit organizations. This association did not appear to reflect treatment effect or adverse events. CONCLUSIONS: Conclusions in trials funded by for-profit organizations may be more positive due to biased interpretation of trial results. Readers should carefully evaluate whether conclusions in randomized trials are supported by data.
Deficits in psychologic and classroom performance of children with elevated dentine lead levels. Needleman HL, Gunnoe C, Leviton A, Reed R, Peresie H, Maher C, Barrett P. N Engl J Med 1979: 300(13); 689-95. [Medline] To measure the neuropsychologic effects of unidentified childhood exposure to lead, the performance of 58 children with high and 100 with low dentine lead levels was compared. Children with lead levels scored significantly less well on the Wechsler Intelligence Scale for Children (Revised) than those with low lead levels. This difference was also apparent on verbal subtests, on three other measures of auditory or speech processing and on a measure of attention. Analysis of variance showed that none of these differences could be explained by any of the 39 other variables studied. Also evaluated by a teachers' questionnaire was the classroom behavior of all children (2146 in number) whose teeth were analyzed. The frequency of non-adaptive classroom behavior increased in a dose-related fashion to dentine lead level. Lead exposure, at doses below those producing symptoms severe enough to be diagnosed clinically, appears to be associated with neuropsychologic deficits that may interfere with classroom performance.
The Quality of Drug Studies Published in Symposium Proceedings. Cho M, Bero L. Ann Intern Med 1996: 124; 485 - 489. [Medline] OBJECTIVE: To compare the quality, relevance, and structure of drug studies published in symposium proceedings that are sponsored by drug companies with 1) articles from symposia with other sponsors and 2) articles in the peer-reviewed parent journals of symposium proceedings; and to study the relation between drug company sponsorship and study outcome. DESIGN: Cross-sectional studies of clinical drug studies published in symposium proceedings or their parent medical journals. MEASUREMENTS: The proportion of articles with no methods sections (which are necessary to assess quality); methodologic quality and clinical relevance scores; and the proportion of articles with outcomes favoring the drug of interest. RESULTS: Symposia sponsored by single drug companies had more articles without methods sections (10%; 108 of 1064) than did symposia that had other sponsors (3%; 58 of 2314) or symposia that had no mentioned sponsor (2%; 29 of 1663) (P < 0.001). The mean methodologic quality and relevance scores of articles were similar both by type of sponsorship and between articles published in symposia sponsored by single drug companies and articles from the parent journals. Significantly more articles with drug company support (98%; 39 of 40) than without drug company support (79%; 89 of 112) had outcomes favoring the drug of interest (P = 0.01). CONCLUSIONS: Articles in symposia sponsored by single drug companies were similar in quality and clinical relevance to articles with other sponsors and to articles published in the parent journals. Articles with drug company support are more likely than articles without drug company support to have outcomes favoring the drug of interest.
Systematic reviews and meta-analyses on treatment of asthma: critical evaluation. Jadad AR, Moher M, Browman GP, Booker L, Sigouin C, Fuentes M, Stevens R. Bmj 2000: 320(7234); 537-40. [Medline] [Abstract] [Full text] [PDF] OBJECTIVE: To evaluate the clinical, methodological, and reporting aspects of systematic reviews and meta-analyses on the treatment of asthma and to compare those published by the Cochrane Collaboration with those published in paper based journals. DESIGN: Analysis of studies identified from Medline, CINAHL, HealthSTAR, EMBASE, Cochrane Library, personal collections, and reference lists. Studies: Articles describing a systematic review or a meta-analysis of the treatment of asthma that were published as a full report, in any language or format, in a peer reviewed journal or the Cochrane Library. MAIN OUTCOME MEASURES: General characteristics of studies reviewed and methodological characteristics (sources of articles; language restrictions; format, design, and publication status of studies included; type of data synthesis; and methodological quality). RESULTS: 50 systematic reviews and meta-analyses were included. More than half were published in the past two years. Twelve reviews were published in the Cochrane Library and 38 were published in 22 peer reviewed journals. Forced expiratory volume in one second was the most frequently used outcome, but few reviews evaluated the effect of treatment on costs or patient preferences. Forty reviews were judged to have serious or extensive flaws. All six reviews associated with industry were in this group. Seven of the 10 most rigorous reviews were published in the Cochrane Library. CONCLUSIONS: Most reviews published in peer reviewed journals or funded by industry have serious methodological flaws that limit their value to guide decisions. Cochrane reviews are more rigorous and better reported than those published in peer reviewed journals.
Validity of indirect comparison for estimating efficacy of competing interventions: empirical evidence from published meta-analyses. Song F, Altman DG, Glenny AM, Deeks JJ. Bmj 2003: 326(7387); 472. [Medline] [Abstract] [Full text] [PDF] OBJECTIVE: To determine the validity of adjusted indirect comparisons by using data from published meta-analyses of randomised trials. DESIGN: Direct comparison of different interventions in randomised trials and adjusted indirect comparison in which two interventions were compared through their relative effect versus a common comparator. The discrepancy between the direct and adjusted indirect comparison was measured by the difference between the two estimates. DATA SOURCES: Database of abstracts of reviews of effectiveness (1994-8), the Cochrane database of systematic reviews, Medline, and references of retrieved articles. RESULTS: 44 published meta-analyses (from 28 systematic reviews) provided sufficient data. In most cases, results of adjusted indirect comparisons were not significantly different from those of direct comparisons. A significant discrepancy (P<0.05) was observed in three of the 44 comparisons between the direct and the adjusted indirect estimates. There was a moderate agreement between the statistical conclusions from the direct and adjusted indirect comparisons (kappa 0.51). The direction of discrepancy between the two estimates was inconsistent. CONCLUSIONS: Adjusted indirect comparisons usually but not always agree with the results of head to head randomised trials. When there is no or insufficient direct evidence from randomised trials, the adjusted indirect comparison may provide useful or supplementary information on the relative efficacy of competing interventions. The validity of the adjusted indirect comparisons depends on the internal validity and similarity of the included trials.
Article makes simple errors and could cause unnecessary deaths. Baigent C, Collins R, Peto R. British Medical Journal 2002: 324(7330); 167. [Medline] [Full text] [PDF] "The worldwide meta-analysis of antiplatelet trials shows that low dose aspirin (or some other effective antiplatelet regimen) reduces non-fatal myocardial infarction, non-fatal stroke, and vascular death in a wide range of patients who are at high risk of occlusive vascular disease. A paper disputing this was published concurrently in the For Debate section of the journal, but the arguments in it (some of which the author also published on the same date in an editorial in the Lancet) depend strongly on quite simple mistakes about the randomised evidence and could cause unnecessary deaths."
Representation of elderly persons and women in published randomized trials of acute coronary syndromes. Lee PY, Alexander KP, Hammill BG, Pasquali SK, Peterson ED. Jama 2001: 286(6); 708-13. [Medline] [Abstract] [Full text] [PDF] CONTEXT: Elderly persons and women were underrepresented in randomized controlled trials (RCTs) prior to 1990. Since then, efforts have been made to correct these biases, but their effect is unclear. OBJECTIVE: To determine whether the percentage of elderly persons and women in published clinical trials of acute coronary syndromes has increased and how this enrollment compared with disease prevalence. DATA SOURCES: The MEDLINE and Cochrane databases were searched for English-language articles from January 1966 to March 2000 regarding myocardial infarction, unstable angina, or acute coronary syndromes. Additional data sources included meta-analyses, review articles, and cardiology textbooks. Estimates of community-based myocardial infarction rates came from the National Registry of Myocardial Infarction and the Worcester Heart Study. STUDY SELECTION: Published RCTs of acute coronary syndrome patients were included and trials enrolling 50 patients or fewer, those without clinical end points, papers published in a language other than English, and unpublished manuscripts were excluded. Of 7645 studies identified, 593 RCTs were selected for review. DATA EXTRACTION: The RCTs were abstracted by 2 of the authors for year of publication, source of support (ie, funding), pharmacotherapy, study phase, number of study sites, trial location, number of patients, mean age of the study population, and any age exclusion criteria for enrollment. DATA SYNTHESIS: The number of published RCTs with explicit age exclusions has declined from 58% during 1966-1990 to 40% during 1991-2000. Trial enrollment of patients aged 75 years or older increased from 2% for studies published during 1966-1990 to 9% during 1991-2000, but remains well below their representation among all patients with myocardial infarction (37%) in the United States. Enrollment of women has risen from 20% for studies published between 1966-1990 to 25% during 1991-2000, but remains well below their proportion of all patients with myocardial infarction (43%) in the United States. CONCLUSIONS: Attempts at making cardiovascular RCTs more inclusive appear to have had limited success; thus, women and elderly persons remain underrepresented in published trial literature relative to their disease prevalence. Because safety and efficacy can vary as a function of sex and age, these enrollment biases undermine efforts to provide evidence-based care to all cardiac patients.
Randomized trials, generalizability, and meta-analysis: Graphical insights for binary outcomes. Baker SG, Kramer BS. BMC Med Res Methodol 2003: 3(1); 10. [Medline] [Abstract] [Full text] [PDF] BACKGROUND: Randomized trials stochastically answer the question. "What would be the effect of treatment on outcome if one turned back the clock and switched treatments in the given population?" Generalizations to other subjects are reliable only if the particular trial is performed on a random sample of the target population. By considering an unobserved binary variable, we graphically investigate how randomized trials can also stochastically answer the question, "What would be the effect of treatment on outcome in a population with a possibly different distribution of an unobserved binary baseline variable that does not interact with treatment in its effect on outcome?" METHOD: For three different outcome measures, absolute difference (DIF), relative risk (RR), and odds ratio (OR), we constructed a modified BK-Plot under the assumption that treatment has the same effect on outcome if either all or no subjects had a given level of the unobserved binary variable. (A BK-Plot shows the effect of an unobserved binary covariate on a binary outcome in two treatment groups; it was originally developed to explain Simpsons's paradox.) RESULTS: For DIF and RR, but not OR, the BK-Plot shows that the estimated treatment effect is invariant to the fraction of subjects with an unobserved binary variable at a given level. CONCLUSION: The BK-Plot provides a simple method to understand generalizability in randomized trials. Meta-analyses of randomized trials with a binary outcome that are based on DIF or RR, but not OR, will avoid bias from an unobserved covariate that does not interact with treatment in its effect on outcome.
Sperm output of healthy men in Australia: magnitude of bias due to self-selected volunteers. Handelsman DJ. Hum Reprod 1997: 12(12); 2701-5. [Medline] Controversial claims, based on a meta-analysis aggregating 61 heterogeneous observational studies, have been made that human sperm output has decreased by 50% over the last six decades and that this trend may be due to global pollution. If true, such effects should be evident in all areas of the globe; however, longitudinal studies within single centres in Europe and America have produced conflicting results and there are no reports from the southern hemisphere. We therefore reviewed semen analyses obtained from 1980-1995 from 689 healthy men volunteering for screening either as potential sperm donors for a donor insemination programme (n = 509) or to participate in five male contraception research studies (studies no. 1-5, n = 180). All were recruited through the Andrology Unit of the Royal Prince Alfred Hospital, Sydney, by the same doctors using standard methods of recruiting, screening and laboratory examination throughout the period 1980-1995. Recruitment was by advertising without regard to marital or fertility status except in two contraceptive efficacy studies (no. 1 and no. 3) where participants had to be in a stable relationship requiring contraception. Analysing the first semen sample individually or when grouped by year of ejaculation, there was no significant difference in sperm concentration over time or between years or according to year of birth. During the second half of this period, 180 consecutive volunteers were recruited by the same doctors and staff for five male contraception studies. The median sperm concentration for studies no. 1 (103 x 10(6) ml) and no. 2 (142 x 10(6) ml) were significantly (P < 0.05) higher than for studies no. 3-5 (84, 67 and 63 x 10(6) ml, respectively) and for potential sperm donors (median 69 x 10(6) ml). The inconsistency of these estimates illustrates the magnitude of bias (up to 100%) in sperm output that may occur in recruiting groups of self-referred volunteers within a single centre. This highlights the invalidity of extrapolating similar findings on sperm output of self-selected volunteers to the general male community or in using such study groups to characterize sperm output in supposedly 'normal' men.
Assessment of methodological quality of primary studies by systematic reviews: results of the metaquality cross sectional study. Moja LP, Telaro E, D'Amico R, Moschetti I, Coe L, Liberati A. Bmj 2005; [Medline] [Abstract] [PDF] OBJECTIVES: To describe how the methodological quality of primary studies is assessed in systematic reviews and whether the quality assessment is taken into account in the interpretation of results. DATA SOURCES: Cochrane systematic reviews and systematic reviews in paper based journals. STUDY SELECTION: 965 systematic reviews (809 Cochrane reviews and 156 paper based reviews) published between 1995 and 2002. DATA SYNTHESIS: The methodological quality of primary studies was assessed in 854 of the 965 systematic reviews (88.5%). This occurred more often in Cochrane reviews than in paper based reviews (93.9% v 60.3%, P<0.0001). Overall, only 496 (51.4%) used the quality assessment in the analysis and interpretation of the results or in their discussion, with no significant differences between Cochrane reviews and paper based reviews (52% v 49%, P=0.58). The tools and methods used for quality assessment varied widely. CONCLUSIONS: Cochrane reviews fared better than systematic reviews published in paper based journals in terms of assessment of methodological quality of primary studies, although they both largely failed to take it into account in the interpretation of results. Methods for assessment of methodological quality by systematic reviews are still in their infancy and there is substantial room for improvement.
Reporting, appraising, and integrating data on genotype prevalence and gene-disease associations. Little J, Bradley L, Bray MS, Clyne M, Dorman J, Ellsworth DL, Hanson J, Khoury M, Lau J, O'Brien TR, Rothman N, Stroup D, Taioli E, Thomas D, Vainio H, Wacholder S, Weinberg C. Am J Epidemiol 2002: 156(4); 300-10. [Medline] The recent completion of the first draft of the human genome sequence and advances in technologies for genomic analysis are generating tremendous opportunities for epidemiologic studies to evaluate the role of genetic variants in human disease. Many methodological issues apply to the investigation of variation in the frequency of allelic variants of human genes, of the possibility that these influence disease risk, and of assessment of the magnitude of the associated risk. Based on a Human Genome Epidemiology workshop, a checklist for reporting and appraising studies of genotype prevalence and studies of gene-disease associations was developed. This focuses on selection of study subjects, analytic validity of genotyping, population stratification, and statistical issues. Use of the checklist should facilitate the integration of evidence from these studies. The relation between the checklist and grading schemes that have been proposed for the evaluation of observational studies is discussed. Although the limitations of grading schemes are recognized, a robust approach is proposed. Other issues in the synthesis of evidence that are particularly relevant to studies of genotype prevalence and gene-disease association are discussed, notably identification of studies, publication bias, criteria for causal inference, and the appropriateness of quantitative synthesis.
A quality assessment of randomized control trials of primary treatment of breast cancer. Liberati A, Himel HN, Chalmers TC. J Clin Oncol 1986: 4(6); 942-51. [Medline] The methodology of randomized control trials (RCTs) of the primary treatment of early breast cancer has been reviewed using a quantitative method. Sixty-three RCTs comparing various treatment modalities tested on over 34,000 patients and reported in 119 papers were evaluated according to a standardized scoring system. A percentage score was developed to assess the internal validity of a study (referring to the quality of its design and execution) and its external validity (referring to presentation of information required to determine its generalizability). An overall score was also calculated as the combination of the two. The mean overall score for the 63 RCTs was 50% (95% confidence interval [CI] = 46% to 54%) with small and nonstatistically significant differences between types of trial. The most common methodologic deficiencies encountered in these studies were related to the randomization process (only 27 of the 63 RCTs adopted a truly blinded procedure), the handling of withdrawals (only 26 RCTs included all patients in the analyses), the description of the follow-up schedule (only 12 RCTs reported adequately), the report of side effects (adequate information given in 33 RCTs), and the description of the patient population (satisfactory in 29 RCTs). Telephone calls to the principal investigators improved the quality scores by seven points on a scale of 100, indicating that some of the deficiencies lay in reporting rather than performance. There was evidence that quality has improved over time and that the increasing tendency of involving a biostatistician in the research team was positively associated with the improvement of the internal validity but not with the external.
Truth survival in clinical research: an evidence-based requiem? Poynard T, Munteanu M, Ratziu V, Benhamou Y, Di Martino V, Taieb J, Opolon P. Ann Intern Med 2002: 136(12); 888-95. [Medline] PURPOSE: Factors associated with the survival of truth of clinical conclusions in the medical literature are unknown. The authors hypothesized that conclusions derived from studies using better methodology should have a longer half-life. DATA SOURCES: MEDLINE and hand searches of journals with studies on cirrhosis and hepatitis. STUDY SELECTION: Original articles and meta-analyses published from 1945 to 1999 about cirrhosis or hepatitis in adults. DATA SYNTHESIS: In 2000, 285 of 474 conclusions (60%) were still considered to be true, 91 (19%) were considered to be obsolete, and 98 (21%) were considered to be false. The half-life of truth was 45 years. The 20-year survival of conclusions derived from meta-analysis was lower (57% +/- 10%) than that from nonrandomized studies (87% +/- 2%) (P < 0.001) or randomized trials (85% +/- 3%) (P < 0.001). The survival of conclusions was not different when studies of high methodologic quality were compared with those of low quality. In randomized trials, the 50-year survival rate was higher for 52 negative conclusions (68% +/- 13%) than for 118 positive conclusions (14% +/- 4%) (P < 0.001). CONCLUSIONS: Contrary to the authors' hypothesis, conclusions based on recognized, good methodology had no clear survival advantage. To better convince clinicians of the long-term utility of evidence-based medicine, better prognostic factors should be developed.
Review of randomised controlled trials of traditional chinese medicine. Tang J-L, Zhan S, Ernst E. BMJ 1999: 319(7203); 160-61. [Medline] [Full text] [PDF] [Excerpt] Many randomised controlled trials have been conducted in China to evaluate the effectiveness of traditional Chinese medicine, but much of the information is inaccessible to Western doctors. We estimated the total number of randomised controlled trials published in China and identified problems in applying such methodology to the evaluation of traditional Chinese medicine, which would serve as preparatory work for systematic review and dissemination of the randomised evidence for such medicine.
The effect of low-intensity pulsed ultrasound therapy on time to fracture healing: a meta-analysis. Busse JW, Bhandari M, Kulkarni AV, Tunks E. Cmaj 2002: 166(4); 437-41. [Medline] [Abstract] [Full text] [PDF] BACKGROUND: The effect of low-intensity ultrasonography on fracture healing is controversial, and current management of fractures does not generally involve the use of ultrasound therapy. We describe a systematic review and meta-analysis of randomized controlled trials of low-intensity pulsed ultrasound therapy for healing of fractures. METHODS: We searched 5 electronic databases (MEDLINE, EMBASE, Cochrane Database of Randomised Clinical Trials, HealthSTAR and CINAHL) for trials of ultrasonography and fracture healing, in any language, published from 1966 to December 2000. In addition, selected journals published from 1996 to December 2000 were searched by hand for relevant articles, and attempts were made to contact content experts in the area of ultrasound therapy and fracture healing as well as primary authors of reviewed trials. Trials selected for review met the following criteria: random allocation of treatments; inclusion of skeletally mature patients of either sex with 1 or more fractures; blinding of both the patient and the assessor(s) as to fracture healing; administration of low-intensity pulsed ultrasound treatments to at least 1 of the treatment groups; and assessment of time to fracture healing, as determined radiographically by bridging of 3 or 4 cortices. Two reviewers independently applied selection criteria to blinded articles, and selected articles were scored for methodologic quality. The internal validity of each trial was assessed with the use of a 5-point scale that evaluates the quality of trial method on the basis of description and appropriateness of randomization and double-blinding, and assessment of study withdrawals and likelihood of bias. RESULTS: We identified 138 potentially eligible studies, of which 6 met our inclusion criteria. Agreement beyond chance of quality assessments of the 6 trials was good (intraclass correlation coefficient 0.77, p = 0.03). One trial was a repeat analysis of previously reported data, and 2 trials appeared to report on a shared group of subjects. Three trials, representing 158 fractures, were of sufficient homogeneity for pooling. The pooled results showed that time to fracture healing was significantly shorter in the groups receiving low-intensity ultrasound therapy than in the control groups. The weighted average effect size was 6.41 (95% confidence interval 1.01-11.81), which converts to a mean difference in healing time of 64 days between the treatment and control groups. INTERPRETATION: There is evidence from randomized trials that low-intensity pulsed ultrasound treatment may significantly reduce the time to fracture healing for fractures treated nonoperatively. There does not appear to be any additional benefit to ultrasound treatment following intramedullary nailing with prior reaming. Larger trials are needed to resolve this issue.
MR findings in humeral epicondylitis. A systematic review. Pasternack I, Tuovinen EM, Lohman M, Vehmas T, Malmivaara A. Acta Radiol 2001: 42(5); 434-40. [Medline] PURPOSE: To highlight the importance of meta-analysis in diagnostic imaging by presenting a systematic search of the literature on the accuracy of MR imaging in epicondylitis. MATERIAL AND METHODS: The literature was comprehensively reviewed to identify studies on MR findings in epicondylitis. Reviewers blind to the clinical diagnoses screened the data according to predetermined inclusion criteria. Data were collected and validity and relevance were assessed on structured forms. RESULTS: Seven studies including 148 patients with epicondylitis were accepted for the analysis. Eleven asymptomatic contralateral elbows and 29 elbows of healthy volunteers served as controls. The volunteers were distinctly younger than the patients. The MR technique was divergent, and the observed pathological changes also varied. The most frequent alteration was a change in the common extensor tendon signal (90%, 95% confidence interval 84-94%); 14% of the healthy volunteers and 50% of the contralateral elbows displayed the similar alteration. CONCLUSION: Small sample size and methodological shortcomings in the original studies make the assessment of MR findings in epicondylitis questionable. There is a need for well-designed studies in which clinical features and occupational backgrounds as well as imaging parameters are carefully documented.
Changes in clinical trials mandated by the advent of meta-analysis. Chalmers TC, Lau J. Stat Med 1996: 15(12); 1263-8; discussion 1269-72. [Medline] Service on the Data Monitoring Committee of the CPEP (Calcium for Pre-eclampsia Prevention) has led us to four conclusions about clinical trials which we should like to present to this gathering of biostatisticians for their reactions: (i) meta-analyses of the pertinent published trials of the same therapy should always be undertaken before the start of a new trial, and the results examined to help determine the design of a new trial or determine if a trial should be undertaken at all; (ii) assuming that a decision is made to go ahead, the results of the past trials should be used in sizing the new one; (iii) in the course of the new one, regardless of the size estimates, stopping early should be considered if the trends conform to the results of the meta-analysis; and (iv) heterogeneity of patients entering clinical trials is desirable and should be specifically studied, and it should never be concluded that an average outcome is applicable to all future patients.
Applying the results of trials and systematic reviews to individual patients. Glasziou P, Guyatt GH, Dans AL, Dans LF, Straus S, Sackett DL. ACP Journal Club 1998: 129(3); A15-6. [Medline] Your patient is a 60-year-old hypertensive, alcoholic woman whose symptomless atrial fibrillation was first documented 3 months ago. An echocardiogram shows an enlarged left atrium, rendering successful cardioversion unlikely. She tells you that both of her parents had severe strokes that made the last years of their lives horrible, and she is terrified of having a stroke. You know that a meta-analysis of 5 randomized trials of warfarin in nonvalvular atrial fibrillation demonstrated a 68% relative risk reduction (RRR) in stroke (1). You consider prescribing warfarin for this patient but know that she would not have qualified for the study because alcoholism increases her risk for major hemorrhage (2).
Recombinant or urinary follicle-stimulating hormone? A cost-effectiveness analysis derived by particularizing the number needed to treat from a published meta-analysis. Ola B, Papaioannou S, Afnan MA, Hammadieh N, Gimba S. Fertil Steril 2001: 75(6); 1106-10. [Medline] OBJECTIVE: To demonstrate that particularizing pooled results of a meta-analysis can derive incremental cost effectiveness of superovulation with recombinant follicle-stimulating hormones (rFSH) vs. the highly purified urinary form (uFSH) for assisted conception. DESIGN: A retrospective study. SETTING: An assisted conception unit in the United Kingdom. PATIENT(S): One hundred forty-five fresh in vitro fertilization (IVF) and 58 fresh intracytoplasmic sperm injection (ICSI) cycles. INTERVENTION(S): rFSH vs. uFSH. MAIN OUTCOME MEASURE(S): Incremental cost-effectiveness (i.e., cost needed to treat, or CNT) and budget-impact analyses of rFSH vs. uFSH. RESULT(S): In women less than 30 years old, the clinical pregnancy rate was 37.7% (95% CI 24.8%-52.1%), the particularized number needed to treat (pNNT) was -19, and the cost needed to treat was 5070.51 pounds sterling (3660.53 pounds sterling to 7619.32 pounds sterling). For the 30- to 35-year-old age group, the clinical pregnancy rate was 29.9% (95% CI 20.0%--41.4%), the particularized number needed to treat was -24, and CNT was 7335.59 pounds stering (5284.11 pounds sterling to 10,941.22 pounds sterling). For the 36- to 40-year-old age group, the clinical pregnancy rate was 30.6.0% (95% CI 19.6%--43.7%), the particularized number needed to treat was -23.0, and the CNT was 8569.67 pounds sterling (5998.70 pounds sterling to 13,413.24 pounds sterling). CONCLUSION(S): The CNT and thus the budget impact analyses (the extra number of cycles that can be funded by the CNT) both increase directly with age of the patient, and inversely with the clinical pregnancy rate.
An addition to the controversy on sunlight exposure and melanoma risk: a meta-analytical approach. Nelemans PJ, Rampen FH, Ruiter DJ, Verbeek AL. J Clin Epidemiol 1995: 48(11); 1331-42. [Medline] Case control studies on the association between sunlight exposure and melanoma risk show considerable differences in design; this could be responsible for the variation in study results. In an attempt to resolve the controversy between study results, the results of 25 publications on case control studies were evaluated using meta-analytical techniques. Comparison of odds ratios between subgroups of studies revealed that the range of odds ratios was far greater for hospital-based studies than for population-based studies. For the latter type of studies, the odds ratios were homogeneous and the pooled odds ratios were 1.57 (95% confidence interval [CI], 1.29-1.91) for intermittent sunlight exposure and 0.73 (95% CI, 0.60-0.89) for chronic exposure. However, among other problems, the lack of standardized measures for sunlight exposure warrants cautious interpretation of these results. It is concluded that evidence to support the intermittent sunlight theory is still far from complete.
Randomised controlled trial of cardiotocography versus Doppler auscultation of fetal heart at admission in labour in low risk obstetric population. Mires G, Williams F, Howie P. British Medical Journal 2001: 322(7300); 1457-60; discussion 1460-2. [Medline] [Abstract] [Full text] [PDF] OBJECTIVE: To compare the effect of admission cardiotocography and Doppler auscultation of the fetal heart on neonatal outcome and levels of obstetric intervention in a low risk obstetric population. DESIGN: Randomised controlled trial. SETTING: Obstetric unit of teaching hospital PARTICIPANTS: Pregnant women who had no obstetric complications that warranted continuous monitoring of fetal heart rate in labour. INTERVENTION: Women were randomised to receive either cardiotocography or Doppler auscultation of the fetal heart when they were admitted in spontaneous uncomplicated labour. MAIN OUTCOME MEASURES: The primary outcome measure was umbilical arterial metabolic acidosis. Secondary outcome measures included other measures of condition at birth and obstetric intervention. RESULTS: There were no significant differences in the incidence of metabolic acidosis or any other measure of neonatal outcome among women who remained at low risk when they were admitted in labour. However, compared with women who received Doppler auscultation, women who had admission cardiotocography were significantly more likely to have continuous fetal heart rate monitoring in labour (odds ratio 1.49, 95% confidence interval 1.26 to 1.76), augmentation of labour (1.26, 1.02 to 1.56), epidural analgesia (1.33, 1.10 to 1.61), and operative delivery (1.36, 1.12 to 1.65). CONCLUSIONS: Compared with Doppler auscultation of the fetal heart, admission cardiotocography does not benefit neonatal outcome in low risk women. Its use results in increased obstetric intervention, including operative delivery.
Substituting proxy ratings for patient ratings in cancer clinical trials: an analysis based on a Southwest Oncology Group trial in patients with brain metastases. Moinpour CM, Lyons B, Schmidt SP, Chansky K, Patchell RA. Qual Life Res 2000: 9(2); 219-31. [Medline] In studies of the effect of cancer treatment in the advanced disease setting, researchers have attempted to avoid missing data for quality of life (QOL) assessments by either substituting proxy for patient assessments from the outset or by interspersing proxy measures when patients are unable to respond. Although poor agreement between patient and proxy assessments has been amply demonstrated in the literature, interest in using proxy measures persists. Completion of the Spitzer QL-Index by a small sample of patients with brain metastases and family member proxies provided data for evaluating the ability to substitute proxy for patient QOL assessments. These data cannot address treatment efficacy due to the modest sample size. Rather, the analyses serve to alert researchers to the important distinction (in a clinical trial setting) between agreement and the use of the proxy as a surrogate. We present several methods for evaluating the accuracy of proxy measures and for identifying other sources of error and bias that may vary with time or with treatment arm. Lin's concordance correlation coefficient suggests that proxies are generally a poor substitute for capturing a patient's perspective of his/her QOL. A longitudinal analysis suggests that the use of proxy rather than patient responses could lead to different conclusions concerning radiation therapy's effect on QOL.
Statistics Notes: Validating scales and indexes. Bland JM, Altman DG. Bmj 2002: 324(7337); 606-7. [Medline] [Full text] [PDF]
Empirical Evidence for Selective Reporting of Outcomes in Randomized Trials: Comparison of Protocols to Published Articles. Chan A-W, MD, Hrobjartsson A, MD, PhD, Haahr MT, BSc, Gotzsche PC, MD, DrMedSci, Altman DG, DSc. Journal of the American Medical Association 2004: 291(20); 2457-65. [Medline] [Abstract]
Publication bias in situ. Phillips CV. BMC Med Res Methodol 2004: 4(1); 20. [Medline] [Abstract] [Full text] [PDF] BACKGROUND: Publication bias, as typically defined, refers to the decreased likelihood of studies' results being published when they are near the null, not statistically significant, or otherwise "less interesting." But choices about how to analyze the data and which results to report create a publication bias within the published results, a bias I label "publication bias in situ" (PBIS). DISCUSSION: PBIS may create much greater bias in the literature than traditionally defined publication bias (the failure to publish any result from a study). The causes of PBIS are well known, consisting of various decisions about reporting that are influenced by the data. But its impact is not generally appreciated, and very little attention is devoted to it. What attention there is consists largely of rules for statistical analysis that are impractical and do not actually reduce the bias in reported estimates. PBIS cannot be reduced by statistical tools because it is not fundamentally a problem of statistics, but rather of non-statistical choices and plain language interpretations. PBIS should be recognized as a phenomenon worthy of study - it is extremely common and probably has a huge impact on results reported in the literature - and there should be greater systematic efforts to identify and reduce it. The paper presents examples, including results of a recent HIV vaccine trial, that show how easily PBIS can have a large impact on reported results, as well as how there can be no simple answer to it. SUMMARY: PBIS is a major problem, worthy of substantially more attention than it receives. There are ways to reduce the bias, but they are very seldom employed because they are largely unrecognized.
Efficacy and tolerability of selective serotonin reuptake inhibitors compared with tricyclic antidepressants in depression treated in primary care: systematic review and meta-analysis. MacGillivray S, Arroll B, Hatcher S, Ogston S, Reid I, Sullivan F, Williams B, Crombie I. Bmj 2003: 326(7397); 1014. [Medline] OBJECTIVE: To compare the efficacy and tolerability of tricyclic antidepressants with selective serotonin reuptake inhibitors in depression in primary care. DESIGN: Systematic review and meta-analysis of randomised controlled trials. DATA SOURCES: Register of the Cochrane Collaboration's depression, anxiety, and neurosis group. Reference lists of initial studies and other relevant review papers. Selected authors and experts. SELECTION OF STUDIES: Studies had to meet minimum requirements on: adequacy of sample size, adequate allocation concealment, clear description of treatment, representative source of subjects, use of diagnostic criteria or clear specification of inclusion criteria, details regarding number and reasons for withdrawal by group, and outcome measures described clearly or use of validated instruments. MAIN OUTCOME MEASURES: Standardised mean difference of final mean depression scores and relative risk of response when using the clinical global impression score. Relative risk of withdrawing from treatment at any time, and the number withdrawing due to side effects. RESULTS: 11 studies (2951 participants) compared a selective serotonin reuptake inhibitor with a tricyclic antidepressant. Efficacy between selective serotonin reuptake inhibitors and tricyclics did not differ significantly (standardised weighted mean difference, fixed effects 0.07, 95% confidence interval -0.02 to 0.15; z=1.59, P<0.11). Significantly more patients receiving a tricyclic withdrew from treatment (relative risk 0.78, 95% confidence interval 0.68 to 0.90; z=3.37, P<0.0007) and withdrew specifically because of side effects (0.73, 0.60 to 0.88; z=3.24, P<0.001). Most studies included were small and supported by commercial funding. Many studies were of low methodological quality or did not present adequate data for analysis, or both, and were of short duration, typically six to eight weeks. CONCLUSION: The evidence on the relative efficacy of selective serotonin reuptake inhibitors and tricyclic antidepressants in primary care is sparse and of variable quality. The study setting is likely to be an important factor in assessing the efficacy and tolerability of treatment with antidepressant drugs.
Subgroup analyses in randomised controlled trials: quantifying the risks of false-positives and false-negatives. Brookes ST, Whitley E, Peters TJ, Mulheran PA, Egger M, Davey Smith G. Health Technol Assess 2001: 5(33); 1-56. [Medline] [Abstract] [PDF] BACKGROUND: Subgroup analyses are common in randomised controlled trials (RCTs). There are many easily accessible guidelines on the selection and analysis of subgroups but the key messages do not seem to be universally accepted and inappropriate analyses continue to appear in the literature. This has potentially serious implications because erroneous identification of differential subgroup effects may lead to inappropriate provision or withholding of treatment. OBJECTIVES: (1) To quantify the extent to which subgroup analyses may be misleading. (2) To compare the relative merits and weaknesses of the two most common approaches to subgroup analysis: separate (subgroup-specific) analyses of treatment effect and formal statistical tests of interaction. (3) To establish what factors affect the performance of the two approaches. (4) To provide estimates of the increase in sample size required to detect differential subgroup effects. (5) To provide recommendations on the analysis and interpretation of subgroup analyses. METHODS: The performances of subgroup-specific and formal interaction tests were assessed by simulating data with no differential subgroup effects and determining the extent to which the two approaches (incorrectly) identified such an effect, and simulating data with a differential subgroup effect and determining the extent to which the two approaches were able to (correctly) identify it. Initially, data were simulated to represent the 'simplest case' of two equal-sized treatment groups and two equal-sized subgroups. Data were first simulated with no differential subgroup effect and then with a range of types and magnitudes of subgroup effect with the sample size determined by the nominal power (50-95%) for the overall treatment effect. Additional simulations were conducted to explore the individual impact of the sample size, the magnitude of the overall treatment effect, the size and number of treatment groups and subgroups and, in the case of continuous data, the variability of the data. The simulated data covered the types of outcomes most commonly used in RCTs, namely continuous (Gaussian) variables, binary outcomes and survival times. All analyses were carried out using appropriate regression models, and subgroup effects were identified on the basis of statistical significance at the 5% level. RESULTS: While there was some variation for smaller sample sizes, the results for the three types of outcome were very similar for simulations with a total sample size of greater than or equal to 200. With simulated simplest case data with no differential subgroup effects, the formal tests of interaction were significant in 5% of cases as expected, while subgroup-specific tests were less reliable and identified effects in 7-66% of cases depending on whether there was an overall treatment effect. The most common type of subgroup effect identified in this way was where the treatment effect was seen to be significant in one subgroup only. When a simulated differential subgroup effect was included, the results were dependent on the nominal power of the simulated data and the type and magnitude of the subgroup effect. However, the performance of the formal interaction test was generally superior to that of the subgroup-specific analyses, with more differential effects correctly identified. In addition, the subgroup-specific analyses often suggested the wrong type of differential effect. The ability of formal interaction tests to (correctly) identify subgroup effects improved as the size of the interaction increased relative to the overall treatment effect. When the size of the interaction was twice the overall effect or greater, the interaction tests had at least the same power as the overall treatment effect. However, power was considerably reduced for smaller interactions, which are much more likely in practice. The inflation factor required to increase the sample size to enable detection of the interaction with the same power as the overall effect varied with the size of the interaction. For an interaction of the same magnitude as the overall effect, the inflation factor was 4, and this increased dramatically to of greater than or equal to 100 for more subtle interactions of < 20% of the overall effect. Formal interaction tests were generally robust to alterations in the number and size of the treatment and subgroups and, for continuous data, the variance in the treatment groups, with the only exception being a change in the variance in one of the subgroups. In contrast, the performance of the subgroup-specific tests was affected by almost all of these factors with only a change in the number of treatment groups having no impact at all. CONCLUSIONS: While it is generally recognised that subgroup analyses can produce spurious results, the extent of the problem is almost certainly under-estimated. This is particularly true when subgroup-specific analyses are used. In addition, the increase in sample size required to identify differential subgroup effects may be substantial and the commonly used 'rule of four' may not always be sufficient, especially when interactions are relatively subtle, as is often the case. CONCLUSIONS--RECOMMENDATIONS FOR SUBGROUP ANALYSES AND THEIR INTERPRETATION: (1) Subgroup analyses should, as far as possible, be restricted to those proposed before data collection. Any subgroups chosen after this time should be clearly identified. (2) Trials should ideally be powered with subgroup analyses in mind. However, for modest interactions, this may not be feasible. (3) Subgroup-specific analyses are particularly unreliable and are affected by many factors. Subgroup analyses should always be based on formal tests of interaction although even these should be interpreted with caution. (4) The results from any subgroup analyses should not be over-interpreted. Unless there is strong supporting evidence, they are best viewed as a hypothesis-generation exercise. In particular, one should be wary of evidence suggesting that treatment is effective in one subgroup only. (5) Any apparent lack of differential effect should be regarded with caution unless the study was specifically powered with interactions in mind. CONCLUSIONS--RECOMMENDATIONS FOR RESEARCH: (1) The implications of considering confidence intervals rather than p-values could be considered. (2) The same approach as in this study could be applied to contexts other than RCTs, such as observational studies and meta-analyses. (3) The scenarios used in this study could be examined more comprehensively using other statistical methods, incorporating clustering effects, considering other types of outcome variable and using other approaches, such as Bootstrapping or Bayesian methods.
Randomized trial of bilateral oophorectomy versus tamoxifen in premenopausal women with metastatic breast cancer. Ingle JN, Krook JE, Green SJ, Kubista TP, Everson LK, Ahmann DL, Chang MN, Bisel HF, Windschitl HE, Twito DI, et al. J Clin Oncol 1986: 4(2); 178-85. A randomized clinical trial was performed to compare the efficacy of bilateral oophorectomy with that of tamoxifen at a dose of 10 mg twice daily in premenopausal women with metastatic breast cancer, and to examine the efficacy of each as a crossover treatment. Initial treatment responses were seen in ten of 27 patients (37%) treated with oophorectomy and seven of 26 patients (27%) treated with tamoxifen. The difference was not statistically significant. Crossover responses were seen in five of 15 patients (33%) treated with oophorectomy, including three responses in ten prior tamoxifen nonresponders; and two of 18 patients (11%) treated with tamoxifen. Time to progression distributions were not significantly different during initial treatment, and no significant differences in survival were noted. Thus, there was no overall disadvantage to the use of tamoxifen as opposed to oophorectomy as initial hormonal therapy, and a failure to respond to tamoxifen did not preclude a response to subsequent oophorectomy. Exploratory data analysis within subsets indicated consistent differential treatment effects in the visceral dominant patients. Of the 16 such patients treated with oophorectomy, eight (50%) experienced objective responses but there were no responses in the 14 patients treated with tamoxifen. In the nine visceral dominant crossover patients who had not responded to initial tamoxifen, three (33%) subsequently responded to oophorectomy. Time to progression distributions within the visceral dominant subset appeared to be better for the patients treated initially with oophorectomy. However, one must be very cautious in drawing conclusions from exploratory subset analyses, especially with the small sample size. Further studies would be required to test any hypothesis of differential organ site responsiveness.
A Consumer's Guide to Subgroup Analyses. Oxman AD, Guyatt GH. Annals of Internal Medicine 1992: 116(1); 78-84. ABSTRACT: The extent to which a clinician should believe and act on the results of subgroup analyses of data from randomized trials or meta-analyses is controversial. Guidelines are provided in this paper for making these decisions. The strength of inference regarding a proposed difference in treatment effect among subgroups is dependent on the magnitude of the difference, the statistical significance of the difference, whether the hypothesis preceded or followed the analysis, whether the subgroup analysis was one of a small number of hypotheses tested, whether the difference was suggested by comparisons within or between studies, the consistency of the difference, and the existence of indirect evidence that supports the difference. Application of these guidelines will assist clinicians in making decisions regarding whether to base a treatment decision on overall results or on the results of a subgroup analysis.
Considerations in the evaluation of surrogate endpoints in clinical trials. summary of a National Institutes of Health workshop. De Gruttola VG, Clax P, DeMets DL, Downing GJ, Ellenberg SS, Friedman L, Gail MH, Prentice R, Wittes J, Zeger SL. Control Clin Trials 2001: 22(5); 485-502. [Medline] We report on recommendations from a National Institutes of Health Workshop on methods for evaluating the use of surrogate endpoints in clinical trials, which was attended by experts in biostatistics and clinical trials from a broad array of disease areas. Recent advances in biosciences and technology have increased the ability to understand, measure, and model biological mechanisms; appropriate application of these advances in clinical research settings requires collaboration of quantitative and laboratory scientists. Biomarkers, new examples of which arise rapidly from new technologies, are used frequently in such areas as early detection of disease and identification of patients most likely to benefit from new therapies. There is also scientific interest in exploring whether, and under what conditions, biomarkers may substitute for clinical endpoints of phase III trials, although workshop participants agreed that these considerations apply primarily to situations where trials using clinical endpoints are not feasible. Evaluating candidate biomarkers in the exploratory phases of drug development and investigating surrogate endpoints in confirmatory trials require the establishment of a statistical and inferential framework. As a first step, participants reviewed methods for investigating the degree to which biomarkers can explain or predict the effect of treatments on clinical endpoints measured in clinical trials. They also suggested new approaches appropriate in settings where biomarkers reflect only indirectly the important processes on the causal path to clinical disease and where biomarker measurement errors are of concern. Participants emphasized the need for further research on development of such models, whether they are empirical in nature or attempt to describe mechanisms in mathematical terms. Of special interest were meta-analytic models for combining information from multiple studies involving interventions for the same condition. Recommendations also included considerations for design and conduct of trials and for assemblage of databases needed for such research. Finally, there was a strong recommendation for increased training of quantitative scientists in biologic research as well as in statistical methods and modeling to ensure that there will be an adequate workforce to meet future research needs.
In search of the magic nutraceutical: problems with current approaches. Heyland DK. J Nutr 2001: 131(9 Suppl); 2591S-5S. [Medline] [Abstract] [Full text] [PDF] Over the last few decades, substrates with immune-modulating properties have been identified in all groups of micro- and macronutrients. Numerous experimental studies have focused on evaluating these substances, either alone or in combination. After hundreds of experiments, no clear, consistent signal exists that any of these agents result in significant treatment benefits in critically ill patients. The current approach to establishing the efficacy of nutritional interventions suffers from several limitations. First, the majority of studies focus on surrogate or substitute end points rather than clinically important end points. Second, the majority of clinical studies are small, and as such are underpowered to detect a significant treatment effect on clinically important end points. Third, the methodological quality of individual randomized trials varies. Methodological limitations, prevalent in nutrition studies, limit the strength of clinical inference that can be made from study results. High quality studies have been shown to differ significantly from low quality studies in their estimation of treatment effect. Fourth, the generalizability of single-site studies is limited. Finally, studies sponsored solely by industry are considered to be less believable than studies conducted under the auspices of peer-review agencies. Future evaluations must be done in the context of large, multicenter, well-designed, randomized trials focusing on clinically important end points that are sponsored from a variety of sources (including peer-reviewed agencies). Although such trials are costly, they are feasible and are much more likely to be believable and generalizable than the current approach.
The promise and peril of surrogate end points in cancer research. Schatzkin A, Gail M. Nat Rev Cancer 2002: 2(1); 19-27. [Medline] Both experimental and observational studies of cancer need to have an end point. Traditionally, in aetiological and prevention studies, that end point has been the incidence of cancer itself, whereas in therapeutic trials, the end point is usually time to cancer recurrence or death. But cancer takes a long time to develop in an individual and is rare in the population. Therefore, aetiological studies and prevention trials must be large and lengthy to be meaningful. Similarly, many therapeutic trials require a long follow-up of large numbers of patients. Surrogate end points--markers of preclinical cancer or of imminent recurrence--are therefore an attractive alternative. But how can we be sure that a study with a surrogate outcome gives us the right answer about the true end point?
What happened to the valid POEMs? A survey of review articles on the treatment of type 2 diabetes. Shaughnessy AF, Slawson DC. Bmj 2003: 327(7409); 266. [Medline] [Abstract] [Full text] [PDF] OBJECTIVE: To evaluate systematically the review literature on type 2 diabetes to assess transmission of the findings of the United Kingdom prospective diabetes study (UKPDS), an important source of recent valid patient oriented evidence that matters (POEMs). DESIGN: Inception cohort analysis of the recent medical literature. STUDIES REVIEWED: Thirty five reviews on treatment of type 2 diabetes. MAIN OUTCOME MEASURES: Presentation of three types of information from UKPDS in review articles: recommendations based on patient oriented outcomes of study; recommendations contradicted by patient oriented outcomes of study; and recommendations based on disease oriented outcomes for which no patient oriented evidence exists. RESULTS: Only six of the reviews included the POEM that tight blood glucose control had no effect on diabetes related or overall mortality. Just seven mentioned that metformin treatment was associated with decreased mortality. Most (30) of the reviews did not report that diabetic patients with hypertension benefit more from good blood pressure control than good blood glucose control. No review pointed out that treatment of overweight patients with type 2 diabetes with insulin or sulphonylurea drugs had no effect on microvascular or macrovascular outcomes. Thirteen reviews recommended drugs as first line treatment for which we do not have patient oriented outcomes data. The average validity assessment score was 1.3 out of a possible score of 15 (95% confidence interval 0.9 to 1.8). CONCLUSIONS: Review articles on the treatment of type 2 diabetes have not accurately transmitted the valid POEM results of the UKPDS to clinicians. Clinicians relying on review articles written by experts as a source of valid POEMs may be misled.
Relation between tumour response to first-line chemotherapy and survival in advanced colorectal cancer: a meta-analysis. Meta-Analysis Group in Cancer. Buyse M, Thirion P, Carlson RW, Burzykowski T, Molenberghs G, Piedbois P. Lancet 2000: 356(9227); 373-8. [Medline] BACKGROUND: Treatment of advanced colorectal cancer has progressed substantially. However, improvements in response rates have not always translated into significant survival benefits. Doubts have therefore been raised about the usefulness of tumour response as a clinical endpoint. METHODS: This meta-analysis was done on individual data from 3791 patients enrolled in 25 randomised trials of first-line treatment with standard bolus intravenous fluoropyrimidines versus experimental treatments (fluorouracil plus leucovorin, fluorouracil plus methotrexate, fluorouracil continuous infusion, or hepatic-arterial infusion of floxuridine). Analyses were by intention to treat. FINDINGS: Compared with bolus fluoropyrimidines, experimental fluoropyrimidines led to significantly higher tumour response rates (454 responses among 2031 patients vs 209 among 1760; odds ratio 0.48 [95% CI 0.40-0.57], p<0.0001) and better survival (1808 deaths among 2031 vs 1580 among 1760; hazard ratio 0.90 [0.84-0.97], p=0.003). The survival benefits could be explained by the higher tumour response rates. However, a treatment that lowered the odds of failure to respond by 50% would be expected to decrease the odds of death by only 6%. In addition, less than half of the variability of the survival benefits in the 25 trials could be explained by the variability of the response benefits in these trials. INTERPRETATION: These analyses confirm that an increase in tumour response rate translates into an increase in overall survival for patients with advanced colorectal cancer. However, in the context of individual trials, knowledge that a treatment has benefits on tumour response does not allow accurate prediction of the ultimate benefit on survival.
Quantifying effect of statins on low density lipoprotein cholesterol, ischaemic heart disease, and stroke: systematic review and meta-analysis. Law MR, Wald NJ, Rudnicka AR. Bmj 2003: 326(7404); 1423. [Medline] [Abstract] [Full text] [PDF] OBJECTIVES: To determine by how much statins reduce serum concentrations of low density lipoprotein (LDL) cholesterol and incidence of ischaemic heart disease (IHD) events and stroke, according to drug, dose, and duration of treatment. DESIGN: Three meta-analyses: 164 short term randomised placebo controlled trials of six statins and LDL cholesterol reduction; 58 randomised trials of cholesterol lowering by any means and IHD events; and nine cohort studies and the same 58 trials on stoke. MAIN OUTCOME MEASURES: Reductions in LDL cholesterol according to statin and dose; reduction in IHD events and stroke for a specified reduction in LDL cholesterol. RESULTS: Reductions in LDL cholesterol (in the 164 trials) were 2.8 mmol/l (60%) with rosuvastatin 80 mg/day, 2.6 mmol/l (55%) with atorvastatin 80 mg/day, 1.8 mmol/l (40%) with atorvastatin 10 mg/day, lovastatin 40 mg/day, simvastatin 40 mg/day, or rosuvastatin 5 mg/day, all from pretreatment concentrations of 4.8 mmol/l. Pravastatin and fluvastatin achieved smaller reductions. In the 58 trials, for an LDL cholesterol reduction of 1.0 mmol/l the risk of IHD events was reduced by 11% in the first year of treatment, 24% in the second year, 33% in years three to five, and by 36% thereafter (P < 0.001 for trend). IHD events were reduced by 20%, 31%, and 51% in trials grouped by LDL cholesterol reduction (means 0.5 mmol/l, 1.0 mmol/l, and 1.6 mmol/l) after results from first two years of treatment were excluded (P < 0.001 for trend). After several years a reduction of 1.8 mmol/l would reduce IHD events by an estimated 61%. Results from the same 58 trials, corroborated by results from the nine cohort studies, show that lowering LDL cholesterol decreases all stroke by 10% for a 1 mmol/l reduction and 17% for a 1.8 mmol/l reduction. Estimates allow for the fact that trials tended to recruit people with vascular disease, among whom the effect of LDL cholesterol reduction on stroke is greater because of their higher risk of thromboembolic stroke (rather than haemorrhagic stroke) compared with people in the general population. CONCLUSIONS: Statins can lower LDL cholesterol concentration by an average of 1.8 mmol/l which reduces the risk of IHD events by about 60% and stroke by 17%.
The relationship between study design, results, and reporting of randomized clinical trials of HIV infection. Ioannidis JP, Cappelleri JC, Sacks HS, Lau J. Control Clin Trials 1997: 18(5); 431-44. We examined whether the study design of randomized clinical trials for medications against human immunodeficiency virus (HIV) may affect the results and whether the outcomes of these trials affect reporting and publication. We used a database of 71 published randomized HIV-related drug efficacy trials and considered the following study design factors: endpoint definition and method of analysis, masked design, sample size, and duration of follow-up. Large variation was noted in the methods of analysis for surrogate endpoints. Often statistical significance for a surrogate endpoint was not associated with statistical significance for the clinical endpoint or for survival in the same trial, although disagreements in the direction of the treatment effect for surrogate endpoints and survival within individual trials were uncommon. Open-label design seemed to affect the magnitude of the treatment effect for two treatments. The magnitude of the treatment effect in trials of zidovudine monotherapy was inversely related to their sample size, but this probably reflected the confounding effect of longer duration of follow-up in large trials (with a resulting loss of efficacy) rather than publication bias. There was, however, evidence for potential bias in reporting and publication of HIV-related trials. Meta-analyses of published trials for specific treatments demonstrated a sizable treatment benefit for all the examined medications regardless of whether these medications were officially approved, controversial, or abandoned, raising concerns about either publication bias or unjustifiable rejection of potentially useful medications. Compared with trials published in specialized journals, trials published in journals of wide readership were larger (p = 0.001) and 4.4 times more likely to report "positive" results (p = 0.01). We identified several examples of trials with "negative" results that have remained unpublished for a long time. In conclusion, study design factors may have an impact on the magnitude and significance of the treatment effect in HIV-related trials. Bias in reporting can further affect the information that these studies provide.
The adverse neuro-developmental effects of postnatal steroids in the preterm infant: a systematic review of RCTs. Barrington KJ. BMC Pediatr 2001: 1(1); 1. [Medline] [Abstract] [Full text] [PDF] BACKGROUND: Recent reports have raised concerns that postnatal steroids may cause neuro-developmental impairment in preterm infants. This systematic review was performed with the objective of determining whether glucocorticoid therapy, to prevent or treat bronchopulmonary dysplasia, impairs neuro-developmental outcomes in preterm infants. METHOD: A systematic review of the literature was performed. Medline was searched and articles retrieved using predefined criteria. Data from randomized controlled trials with adequate neuro-developmental follow up (to at least one year) were entered into a meta-analysis to determine the effects of postnatal treatment of preterm infants with glucocorticoids. Cerebral palsy rates, and neuro-developmental impairment (developmental score more than 2SD below the mean, or cerebral palsy or blindness) were analyzed. The studies were divided into 2 groups according to the extent of contamination of the results by treatment of controls with steroids after the initial study period, those with less than 30% contamination, and those with more than 30% contamination or size of contamination not reported. RESULTS: Postnatal steroid therapy is associated with an increase in cerebral palsy and neuro-developmental impairment. The studies with less contamination show a greater effect of the steroids, consistent with a real direct toxic effect of steroids on the developing central nervous system. The typical relative risk for the development of cerebral palsy derived from studies with less than 30% contamination is 2.86 (95% CI 1.95, 4.19). The typical relative risk for the development of neuro-developmental disability among followed up infants from studies with less than 30% contamination is 1.66 (95% CI 1.26, 2.19). From this subgroup of studies, the number of premature infants who need to be treated to have one more infant with cerebral palsy (number needed to harm, NNH) is 7; to have one more infant with neuro-developmental impairment the NNH is 11. CONCLUSIONS: Postnatal pharmacologic steroid treatment for prevention or treatment of bronchopulmonary dysplasia is associated with dramatic increases in neuro-developmental impairment. As there is no clear evidence in the literature of long term benefit, their use for this indication should be abandoned.
Systematic review and meta-analysis of early postnatal dexamethasone for prevention of chronic lung disease. Bhuta T, Ohlsson A. Arch Dis Child Fetal Neonatal Ed 1998: 79(1); F26-33. [Medline] [Abstract] [Full text] [PDF] AIM: To review systematically the evidence to determine whether dexamethasone treatment of very low birthweight infants begun within 14 days of age prevents chronic lung disease (CLD) without clinically significant side effects. METHODS: Randomised controlled trials of dexamethasone started within this time frame were identified through a search of electronic databases, proceedings of scientific meetings, and personal files. Meta-analyses using event rate ratio (ERR), event rate difference (ERD), and if significant, numbers needed to treat (NNT) for benefits and numbers needed to harm (NNH) for adverse effects were calculated. Weighted mean difference were used for continuous variables. Three prespecified subgroup analyses were performed for; (i) dexamethasone begun within 36 hours (hours) of birth; (ii) dexamethasone initiated between 7-14 days of age; or (iii) if surfactant treatment was used. RESULTS: Ten studies were included in the review; six where dexamethasone was initiated within 36 hours of age, four studies for dexamethasone started between 7 and 14 days and six studies using surfactant. Mortality ERR and NNT with 95% confidence intervals for dexamethasone initiated at 7-14 days of age were 0.35 (0.16, 0.74) and 8 (4, 30). ERRs and NNTs for CLD at 28 days and 36 weeks of postmenstrual age were 0.71 (0.61, 0.84), 8 (5, 17), and 0.57 (0.44, 0.76), 10 (6, 23) in the overall analyses. When dexamethasone was started at 7 to 14 days of age ERR and NNT for CLD at 36 weeks were 0.63 (0.47, 0.85) and 3 (2, 9). Clinically significant side effects included increased risk of hypertension, hyperglycaemia, and increased time to regain birthweight. CONCLUSIONS: These meta-analyses show a significant reduction in risk of CLD at 28 days and 36 weeks of postmenstrual age. In the subgroup where dexamethasone was started between 7 and 14 days of age mortality was significantly reduced. Caution is warranted in the routine use of dexamethasone because of lack of data on long term neurodevelopmental outcomes.
Reporting risks and benefits of therapy by use of the concepts of unqualified success and unmitigated failure: applications to highly cited trials in cardiovascular medicine. Mancini GB, Schulzer M. Circulation 1999: 99(3); 377-83. [Medline] [Abstract] [Full text] [PDF] BACKGROUND: The NNT (number needed to treat) and NNH (number needed to harm) are useful in conveying the results of clinical trials because they emphasize the effort that must be expended to accomplish a single, tangible outcome. But NNT conveys the effort required to achieve a positive outcome without distinguishing between the presence or absence of treatment-related adverse events. Similarly, NNH conveys harm without accounting for the achievement or lack of achievement of the benefit of therapy. Consequently, a mathematical model was developed to extend the NNT and NNH to represent the effort required to achieve "unqualified success" (NNTUS, treatment success without treatment-induced side effects) and "unmitigated failure" (NNHUF, lack of treatment success with treatment-induced side effects). METHODS AND RESULTS: NNTUS was calculated by adjusting the absolute risk reduction to allow for the probability of not incurring a treatment-related adverse event. NNHUF was similarly calculated by adjusting the absolute risk of incurring a treatment-related adverse event by the probability of not incurring any treatment-related benefit. The impact of conveying clinical trial data by the use of NNT, NNTUS, NNH, and NNHUF is illustrated by means of 11 highly cited trials identified systematically from the cardiovascular literature. The treatment effort measured by the NNTUS and the NNHUF was consistently higher than that given by the traditional NNT and NNH. These increments ranged from 1% to several hundred percent. CONCLUSIONS: The NNTUS and the NNHUF represent the treatment effort required on average to achieve 1 unqualified success and 1 unmitigated failure. NNTUS and NNHUF balance benefit and harm in an objective way and are relevant for making service delivery decisions.
Sildenafil (Viagra) for male erectile dysfunction: a meta-analysis of clinical trial reports. Moore RA, Edwards JE, McQuay HJ. BMC Urol 2002: 2(1); 6. [Medline] [Abstract] [Full text] [PDF] BACKGROUND: Evaluation of company clinical trial reports could provide information for meta-analysis at the commercial introduction of a new technology. METHODS: Clinical trial reports of sildenafil for erectile dysfunction from September 1997 were used for meta-analysis of randomised trials (at least four weeks duration) and using fixed or dose optimisation regimens. The main outcome sought was an erection, sufficiently rigid for penetration, followed by successful intercourse, and conducted at home. RESULTS: Ten randomised controlled trials fulfilled the inclusion criteria (2123 men given sildenafil and 1131 placebo). NNT or NNH were calculated for important efficacy, adverse event and discontinuation outcomes. Dose optimisation led to at least 60% of attempts at sexual intercourse being successful in 49% of men, compared with 11% with placebo; the NNT was 2.7 (95% confidence interval 2.3 to 3.3). For global improvement in erections the NNT was 1.7 (1.6 to 1.9). Treatment-related adverse events occurred in 30% of men on dose optimised sildenafil compared with 11% on placebo; the NNH was 5.4 (4.3 to 7.3). All cause discontinuations were less frequent with sildenafil (10%) than with placebo (20%). Sildenafil dose optimisation gave efficacy equivalent to the highest fixed doses, and adverse events equivalent to the lowest fixed doses. CONCLUSION: This review of clinical trial reports available at the time of licensing agreed with later reviews that had many more trials and patients. Making reports submitted for marketing approval available publicly would provide better information when it was most needed, and would improve evidence-based introduction of new technologies.
Single-dose ketorolac and pethidine in acute postoperative pain: systematic review with meta-analysis. Smith LA, Carroll D, Edwards JE, Moore RA, McQuay HJ. Br J Anaesth 2000: 84(1); 48-58. [Medline] [Abstract] [PDF] For a systematic review of postoperative analgesic efficacy and adverse effects of single doses, injected or oral, of pethidine and ketorolac compared with placebo, we sought published randomized studies in moderate to severe postoperative pain. Information on summed pain intensity or pain relief outcomes over 4-6 h was extracted and converted to dichotomous information to produce the number of patients with at least 50% pain relief. This was used to calculate the relative benefit and number-needed-to-treat (NNT) for one patient to achieve at least 50% pain relief. Minor and major adverse effect data were extracted and summarized. For pethidine 100 mg i.m., eight randomized, controlled studies met the inclusion criteria, with 203 patients given pethidine and 161 placebo. The NNT to produce at least 50% pain relief was 2.9 (95% confidence interval 2.3-3.9). At this dose, pethidine produced significantly more drowsiness and dizziness than placebo, with numbers-needed-to-harm (NNH) of 2.9 (2.2-4.4) and 7.2 (4.8-14), respectively. For ketorolac, 14 reports met the inclusion criteria (six i.m. and eight oral). Most i.m. information (176 patients) was available for the 30 mg dose, which had an NNT of 3.4 (2.5-4.9). Most oral information was available for the 10 mg dose, which had an NNT of 2.6 (2.3-3.1). Oral ketorolac 10 mg was consistently at least as effective as ketorolac 30 mg i.m. Only with oral ketorolac 10 mg were there significantly more adverse effects than with placebo, with an NNH for any adverse effect of 7.3 (4.7-17).
Numbers needed to treat derived from meta-analysis. Charlton BG. British Medical Journal 1999: 319(7218); 1199. [Medline] [Full text]
Validity of randomized clinical trials in gastroenterology from 1964-2000. Kjaergard LL, Frederiksen SL, Gluud C. Gastroenterology 2002: 122(4); 1157-60. [Medline] BACKGROUND & AIMS: The internal validity of clinical trials depends on the adequacy of the reported methodological quality. We assessed the methodological quality of all 383 randomized clinical trials published in GASTROENTEROLOGY as original articles from 1964 to 2000. METHODS: The methodological quality (randomization and blinding), sample size, publication year, and disease area were extracted from each trial. Changes during the study period were analyzed by analysis of variance with adjustments for potential confounders. RESULTS: Forty-two percent of all trials reported adequate generation of the allocation sequence, 39% reported adequate allocation concealment, and 62% were double blind. The reported methodological quality improved significantly in the mid-1990s. CONCLUSIONS: The present study shows a positive development, but the reported methodological quality of trials can still be improved.
Trends in UK cancer trials: results from the UK Coordinating Committee for Cancer Research National Register of Cancer Trials. Vale C, Stewart L, Tierney J. Br J Cancer 2005: 92(5); 811-4. [Medline] [Abstract] [Full text] We aimed to study trends in the design and conduct of randomised controlled trials (RCTs) in cancer in the UK, using the UK Coordinating Committee for Cancer Research (UKCCCR) National Register of Cancer Trials (NRCT). We conducted a descriptive survey of 520 UK RCTs in cancer that were registered on the UKCCCR NRCT. All trials had been initiated between 1971 and 2000. Trials on the NRCT have been conducted in a wide variety of cancer types, but with a third in breast (22%) or lung cancer (11%). They have largely been funded by the UK public and charity sectors. Overall, there has been a sustained rise in the total numbers of patients entering UK cancer trials over time with a trend towards larger, multicentre trials, greater recruitment targets and a marked reduction in the average time taken to complete trials. Trends in the design and conduct of noncommercial cancer RCTs from 1971 to 2000 are encouraging. It will be interesting to see how they develop in light of the implementation of recent national initiatives regarding cancer clinical trials in the UK.British Journal of Cancer (2005) 92, 811-814. doi:10.1038/sj.bjc.6602425 www.bjcancer.com.
Grey zones of clinical practice: some limits to evidence-based medicine. Naylor CD. Lancet 1995: 345(8953); 840-2. [Medline]
In defense of case reports and case series. Vandenbroucke JP. Ann Intern Med 2001: 134(4); 330-4. [Medline] [Abstract] [Full text] [PDF] Case reports and case series have their own role in the progress of medical science. They permit discovery of new diseases and unexpected effects (adverse or beneficial) as well as the study of mechanisms, and they play an important role in medical education. Case reports and series have a high sensitivity for detecting novelty and therefore remain one of the cornerstones of medical progress; they provide many new ideas in medicine. At the same time, good case reporting demands a clear focus to make explicit to the audience why a particular observation is important in the context of existing knowledge.
Interpreting the Medical Literature Third Edition. Gehlbach SH (1993) New York: McGraw-Hill. ISBN: 0071054510.
Studying a Study and Testing a Test: How to Read the Health Science Literature Third Edition. Riegelman RK, Hirsch RP (1996) Boston, MA: Little, Brown and Company. ISBN: 0316745219.
The adult human cerebellum is a target of the neuroendocrine system involved in the circadian timing. Fauteck JD, Lerchl A, Bergmann M, Moller M, Fraschini F, Wittkowski W, Stankov B. Neurosci Lett 1994: 179(1-2); 60-4. In an investigation aimed at comprehensive mapping of the adult human brain with respect to receptor sites for the pineal hormone melatonin, we consistently observed specific binding in the cerebellum. Autoradiography and in vitro binding analysis with 125I-labeled melatonin were used to examine the location and the properties of these binding sites. In all cerebellar lobes, highest-density specific binding was localized to the external zone of the molecular layer. The binding was rapid, saturable, displaceable, specific and of high affinity. Physiological concentrations of NaCl decreased the affinity, while presence of calcium ions promoted it. The non-hydrolyzable GTP analog, GTP gamma S, inhibited binding in a dose-dependent manner and provoked a shift towards low affinity. The results strongly suggest that these binding sites may be functional melatonin receptors, and indicate that the adult human cerebellum is a target of melatonin, the pineal hormone involved in the control of the circadian timing.
Further statistics in dentistry. Part 10: Sherlock Holmes, evidence and evidence-based dentistry. Osborn JF, Bulman JS, Petrie A. Br Dent J 2003: 194(4); 189-95. [Medline] [Abstract]
A philosophical analysis of the evidence-based medicine debate. Sehon SR, Stanley DE. BMC Health Serv Res 2003: 3(1); 14. [Medline] [Abstract] [Full text] [PDF] BACKGROUND: The term "evidence-based medicine" (or EBM) was introduced about ten years ago, and there has been considerable debate about the value of EBM. However, this debate has sometimes been obscured by a lack of conceptual clarity concerning the nature and status of EBM. DISCUSSION: First, we note that EBM proponents have obscured the current debate by defining EBM in an overly broad, indeed almost vacuous, manner; we offer a clearer account of EBM and its relation to the alternative approaches to medicine. Second, while EBM proponents commonly cite the philosophical work of Thomas Kuhn and claim that EBM is a Kuhnian 'paradigm shift,' we argue that such claims are seriously mistaken and unduly polarize the EBM debate. Third, we suggest that it is much more fruitful to understand the relationship between EBM and its alternatives in light of a different philosophical metaphor: W.V. Quine's metaphor of the web of belief. Seen in this way, we argue that EBM is an approach to medical practice that is indeed importantly different from the alternatives. SUMMARY: We can have a more productive debate about the value of EBM by being clearer about the nature of EBM and its relationship to alternative approaches to medicine.
Effectiveness of instruction in critical appraisal (evidence-based medicine) skills: a critical appraisal. Norman GR. CMAJ 1998: 158(2); 177-181. [Medline] [Abstract] [PDF] ABSTRACT: OBJECTIVE: To examine the evidence that the teaching of critical appraisal (evidence-based medicine) skills to undergraduate medical students or residents will result in significant gains in knowledge and increased use of the literature in clinical decision-making. DATA SOURCES: Articles published from 1966 to 1995, retrieved through a MEDLINE search supplemented by manual searches; review of bibliographies maintained by individuals involved in teaching critical appraisal skills; and a previous methodological review. STUDY SELECTION: Articles were selected if the study involved some form of control group, although strict randomization was not required, and a measure of performance followed the intervention. Articles were excluded if they simply reported the process of teaching critical appraisal skills or used some form of "happiness index." DATA SYNTHESIS: There were 10 studies of the impact of teaching critical appraisal skills, 6 involving medical students and 4 involving residents. Results from 3 of the studies were nearly uninterpretable and thus were excluded; the remaining 7 were methodologically acceptable. Analysis showed that interventions implemented in undergraduate programs resulted in significant gains in knowledge, as assessed by a written test (mean gain 17.0%; standard deviation [SD] 4.0%). Conversely, studies at the residency level consistently showed a small change in knowledge (mean gain 1.3%; SD 1.7%). Two studies that examined residents' use of the literature were unable to demonstrate any positive changes. CONCLUSIONS: Studies of the effect of teaching critical appraisal skills on gains in knowledge at the undergraduate level showed consistent improvement. By contrast, changes in knowledge at the residency level were small. Several suggestions from the educational literature are offered to increase effectiveness of critical appraisal interventions.
A large outbreak of botulism: the hazardous baked potato. Angulo FJ, Getz J, Taylor JP, Hendricks KA, Hatheway CL, Barth SS, Solomon HM, Larson AE, Johnson EA, Nickey LN, Ries AA. J Infect Dis 1998: 178(1); 172-7. In April 1994, the largest outbreak of botulism in the United States since 1978 occurred in El Paso, Texas. Thirty persons were affected; 4 required mechanical ventilation. All ate food from a Greek restaurant. The attack rate among persons who ate a potato-based dip was 86% (19/22) compared with 6% (11/176) among persons who did not eat the dip (relative risk [RR] = 13.8; 95% confidence interval [CI], 7.6-25.1). The attack rate among persons who ate an eggplant-based dip was 67% (6/9) compared with 13% (241189) among persons who did not (RR = 5.2; 95% CI, 2.9-9.5). Botulism toxin type A was detected from patients and in both dips. Toxin formation resulted from holding aluminum foil-wrapped baked potatoes at room temperature, apparently for several days, before they were used in the dips. Consumers should be informed of the potential hazards caused by holding foil-wrapped potatoes at ambient temperatures after cooking.
Bone mineral density in children and adolescents: relation to puberty, calcium intake, and physical activity. Boot AM, de Ridder MA, Pols HA, Krenning EP, de Muinck Keizer-Schrama SM. Journal of Clinical Endocrinology and Metabolism 1997: 82(1); 57-62. [Medline] The association of height, weight, pubertal stage, calcium intake, and physical activity with bone mineral density (BMD) was evaluated in 500 children and adolescents (205 boys and 295 girls), aged 4-20 yr. The BMD (grams per cm2) of lumbar spine and total body was measured with dual energy x-ray absorptiometry. Lumbar spine volumetric BMD was calculated to correct for bone size. BMD and volumetric BMD increased with age. During puberty, the age-dependent increment was higher. After adjustment for age, the Tanner stage was significantly associated with all three BMD variables in girls and with spinal BMD in boys. In boys, positive correlations were found between BMD and both calcium intake and physical activity after adjustment for age. Stepwise regression analysis with weight, height, Tanner stage, calcium intake, and physical activity as determinants with adjustment for age resulted in a model with Tanner stage in girls and weight in boys for all three BMD variables. The major independent determinant of BMD was the Tanner stage in girls and weight in boys.
Dose-dependent catch-up growth after 2 years of growth hormone treatment in intrauterine growth-retarded children. Belgian and French Pediatric Clinics and Sanofi-Choay (France). Chatelain P, Job JC, Blanchard J, Ducret JP, Oliver M, Sagnard L, Vanderschueren-Lodeweyckx M. Journal of Clinical Endocrinology and Metabolism 1994: 78(6); 1454-60. [Medline] This study reports the results of a 2-yr clinical trial with GH in 95 short prepubertal children with non-GH-deficient intrauterine growth retardation. This randomized, double blind, controlled study compared the effects of placebo (restricted to the first 6 months) and two doses of GH (0.4 and 1.2 IU/kg.week) given sc 6 days/week for 2 yr. A significant GH dose-dependent growth acceleration was observed. Mean height gain (SDS/CA) was 0.66 +/- 0.07 in group I (low dose, 0.4 IU/kg.week) compared to 1.25 +/- 0.07 in group II (high dose, 1.2 IU/kg.week). Mean bone maturation progression (expressed in months) was 26.2 +/- 1.7 and 30.2 +/- 1.5 over 24 months in groups I and II, respectively. Onset of puberty was observed in some patients of both groups. Whether chronic use of a high GH dose will advance the onset of puberty remains to be established. A great variability of growth acceleration was seen among GH dose groups, suggesting that factors in addition to GH dose might modulate individual responses to treatment. In conclusion, it is suggested that in these patients, dose-dependent catch-up growth could be induced by GH treatment.
2-[125I]iodomelatonin labels sites with identical pharmacological characteristics in chicken brain and chicken retina. Dubocovich ML, Shankar G, Mickel M. Eur J Pharmacol 1989: 162(2); 289-99. The binding and pharmacological characteristics of the melatonin site labeled by the radioligand 2-[125I]iodomelatonin in chicken brain membranes were determined and compared with those of the melatonin site of chicken retinal membranes. The specific binding of 2-[125I]iodomelatonin to chicken brain membranes was found to be stable, saturable, reversible and of high affinity. Scatchard analysis of the binding revealed an affinity constant (Kd) of 344 +/- 24 pM (n = 4) and a total number of binding sites (Bmax) of 57.6 +/- 10.1 fmol/mg protein (n = 4). The Kd value correspond closely with that found in kinetic studies (Kd = 407 pM) and that reported in chicken retinal membranes. Competition experiments were carried out with various compounds revealing the following order of pharmacological affinities: 6-chloromelatonin greater than or equal to 2-iodomelatonin greater than melatonin greater than 2-methyl-6,7-dichloromelatonin greater than 6-hydroxymelatonin greater than N-acetyl-5-hydroxytryptamine greater than luzindole greater than N-acetyl-5-methoxykynurenamine greater than 6-methoxymelatonin greater than N-acetyltryptamine greater than 5-methoxytryptamine greater than 5-hydroxytryptamine greater than 5-methoxy-N,N-dimethyltryptamine greater than 5-methoxytryptophol. This order of pharmacological affinities is identical to that found in chicken retinal membranes. Correlation between affinity constants for various melatonin receptor agonists and putative melatonin receptor antagonists obtained in chicken brain and retinal membranes yielded a correlation coefficient (r) of 0.966 (slope = 0.652, n = 14, P less than 0.01). We conclude that the high affinity site labeled by 2-[125I]iodomelatonin in chicken brain membranes has identical binding and pharmacological characteristics to the ML-1 melatonin receptor site previously described in chicken retinal membranes.
2-[125I]iodomelatonin binding sites in hamster brain membranes: pharmacological characteristics and regional distribution. Duncan MJ, Takahashi JS, Dubocovich ML. Endocrinology 1988: 122(5); 1825-33. Studies in a variety of seasonally breeding mammals have shown that melatonin mediates photoperiodic effects on reproduction. Relatively little is known, however, about the site(s) or mechanisms of action of this hormone for inducing reproductive effects. Although binding sites for [3H]melatonin have been reported previously in bovine, rat, and hamster brain, the pharmacological selectivity of these sites was never demonstrated. In the present study, we have characterized binding sites for a new radioligand, 2-[125I]iodomelatonin, in brains from a photoperiodic species, the Syrian hamster. 2-[125I]Iodomelatonin labels a high affinity binding site in hamster brain membranes. Specific binding of 2-[125I]iodomelatonin is rapid, stable, saturable, and reversible. Saturation studies demonstrated that 2-[125I]iodomelatonin binds to a single class of sites with an affinity constant (Kd) of 3.3 +/- 0.5 nM and a total binding capacity (Bmax) of 110.2 +/- 13.4 fmol/mg protein (n = 4). The Kd value determined from kinetic analysis (3.1 +/- 0.9 nM; n = 5) was very similar to that obtained from saturation experiments. Competition experiments showed that the relative order of potency of a variety of indoles for inhibition of 2-[125I]iodomelatonin binding site to hamster brain membranes was as follows: 6-chloromelatonin greater than or equal to 2-iodomelatonin greater than N-acetylserotonin greater than or equal to 6-methoxymelatonin greater than or equal to melatonin greater than 6-hydroxymelatonin greater than or equal to 6,7-dichloro-2-methylmelatonin greater than 5-methoxytryptophol greater than 5-methoxytryptamine greater than or equal to 5-methoxy-N,N-dimethyltryptamine greater than N-acetyltryptamine greater than serotonin greater than 5-methoxyindole (inactive). Compounds known to act at serotonergic, adrenergic, or dopaminergic receptors were either inactive or relatively ineffective as compared to melatonin. These results suggest that 2-[125I]iodomelatonin is a selective, high affinity probe for identifying melatonin receptor binding sites in rodent brain.
Relative carnitine insufficiency in children with type I diabetes mellitus. Winter SC, Simon M, Zorn EM, Szabo-Aczel S, Vance WH, O'Hara T, Higashi L. American Journal of Disease Childhood 1989: 143(11); 1337-9. [Medline] Recognizing the similarity of type I diabetes mellitus to inborn errors of metabolism that have responded to carnitine therapy, we initiated a study of 54 children with type I diabetes mellitus. Examining a fasting blood sample for levels of carnitine, glucose, and glycosylated hemoglobin A1c, and a urine sample for levels of ketones and glucose, we found 13 children were deficient of free carnitine (less than 20 mumol/L) and 30 had elevated acyl carnitine levels (greater than 11 mumol/L). Statistical tests confirmed a significant difference between the diabetic population and normal population for reduced free carnitine, elevated acyl carnitine, and an elevated ratio of acyl carnitine to free carnitine. Also, a significant correlation was found between the levels of urine glucose and ketones and the level of acyl carnitine. Our data indicate that carnitine deficiency and relative insufficiency may be an overlooked component in the management of diabetes.
Cloning and characterization of the 5'-flanking region for the human topoisomerase III gene. Kim JC, Yoon JB, Koo HS, Chung IK. J Biol Chem 1998: 273(40); 26130-7. The human DNA topoisomerase III (hTOP3) gene encodes a topoisomerase homologous to the Escherichia coli DNA topoisomerase I subfamily. To understand the mechanisms responsible for regulating hTOP3 expression, we have cloned the 5'-flanking region of the gene coding for the hTOP3 and analyzed its promoter activity. The presence of a single transcription initiation site was suggested by primer extension analysis. The hTOP3 gene promoter is moderately high in GC content and lacks a canonical TATA box, suggesting that hTOP3 promoter has overall similarity to promoters of a number of housekeeping genes. Examination of the promoter sequence indicated the presence of four Sp-1 consensus binding sequences and a putative initiator element surrounding the transcription initiation site. Transient expression of a luciferase reporter gene under the control of serially deleted 5'-flanking sequences revealed that the 52-base pair region from -326 to -275 upstream of the transcription initiation site includes a positive cis-acting element(s) for the efficient expression of hTOP3 gene. On the basis of gel mobility shift and supershift assays, we demonstrated that both YY1 and USF1 transcription factors can bind to the 52-base pair region. When HeLa cells were transiently transfected with a mutant construct which had disabled both YY1- and USF1-binding sites, the luciferase activity was greatly reduced, suggesting that these binding elements play a functional role in the basal activation of the hTOP3 promoter. Transfection studies with mutations that selectively impaired YY1 or USF1 binding suggested that both YY1 and USF1 function as activators in the hTOP3 promoter.
Practice parameter: the management of acute gastroenteritis in young children. American Academy of Pediatrics, Provisional Committee on Quality Improvement, Subcommittee on Acute Gastroenteritis. Adams GR, Corwin RM, Pakula LC, Harley BM, Weinblatt HB, Herr TJ, Matthews KE, Fuquay D, Mines RD, Young DA, Cooley JR. Accessed on 2003- This practice parameter formulates recommendations for health care providers about the management of acute diarrhea in children ages 1 month to 5 years. It was developed through a comprehensive search and analysis of the medical literature. Expert consensus opinion was used to enhance or formulate recommentations where data were insufficient. The Provisional Committee on Quality Improvement of the American Academy of Pediatrics (AAP) selected a subcommittee composed of pediatricians with expertise in the fields of gastroenterology, infectious diseases, pediatric practice, and epidemiology to develop the parameter. The subcommittee, the Provisional Committee on Quality Improvement, a review panel of practitioners, and other groups of experts within and outside the AAP reviewed and revised the parameter. Three specific management issues were considered: (1) methods of rehydration, (2) refeeding after rehydration, and (3) the use of antidiarrheal agents. Main outcomes considered were success or failure of rehydration, resolution of diarrhea, and adverse effects from various treatment options. A comprehensive bibliography of literature on gastroenteritis and diarrhea was compiled and reduced to articles amenable to analysis. Oral rehydration therapy was studied in depth; inconsistency in the outcomes measured in the studies interfered with meta-analysis but allowed for formulation of strong conclusions. Oral rehydration was found to be as effective as intravenous therapy in rehydrating children with mild to moderate dehydration and is the therapy of first choice in these patients. Refeeding was supported by enough comparable studies to permit a valid meta-analysis. Early refeeding with milk or food after rehydration does not prolong diarrhea; there is evidence that it may reduce the duration of diarrhea by approximately half a day and is recommended to restore nutritional balance as soon as possible. Data on antidiarrheal agents were not sufficient to demonstrate efficacy; therefore, the routine use of antidiarrheal agents is not recommended, because many of these agents have potentially serious adverse effects in infants and young children. This pracrtice parameter is not indended as a sole source of guidance in the treatment of acute gastroenteritis in children. It is designed to assist pediatricians by providing an analytic framework for the evaluation and treatment of this condition. It is not intended to replace clinical judgment or to establish a protocol for all patients with this condition. It rarely will provide the only appropriate approach to the problem. A technical report describing the analyses used to prepare this parameter and a patient education brochure are available through the Publications Department of the AAP. www.aap.org/policy/gastro.htm
Mortality associated with hormone replacement therapy in younger and older women: a meta-analysis. Salpeter SR, Walsh JM, Greyber E, Ormiston TM, Salpeter EE. J Gen Intern Med 2004: 19(7); 791-804. [Medline] [Abstract] OBJECTIVE: To assess mortality associated with hormone replacement in younger and older postmenopausal women. DESIGN: A comprehensive search of MEDLINE, CINAHL, and EMBASE databases was performed to identify randomized controlled trials of hormone replacement therapy from 1966 to September 2002. The search was augmented by scanning selected journals through April 2003 and references of identified articles. Randomized trials of greater than 6 months' duration were included if they compared hormone replacement with placebo or no treatment, and reported at least 1 death. MEASUREMENTS: Outcomes measured were total deaths and deaths due to cardiovascular disease, cancer, or other causes. Odds ratios (OR) for total and cause-specific mortality were reported separately for trials with mean age of participants less than and greater than 60 years at baseline. MAIN RESULTS: Pooled data from 30 trials with 26,708 participants showed that the OR for total mortality associated with hormone replacement was 0.98 (95% confidence interval [CI], 0.87 to 1.12). Hormone replacement reduced mortality in the younger age group (OR, 0.61; CI, 0.39 to 0.95), but not in the older age group (OR, 1.03; CI, 0.90 to 1.18). For all ages combined, treatment did not significantly affect the risk for cardiovascular or cancer mortality, but reduced mortality from other causes (OR, 0.67; CI, 0.51 to 0.88). CONCLUSIONS: Hormone replacement therapy reduced total mortality in trials with mean age of participants under 60 years. No change in mortality was seen in trials with mean age over 60 years.
Ultrarapid metabolism of sparteine: frequency of alleles with duplicated CYP2D6 genes in a Danish population as determined by restriction fragment length polymorphism and long polymerase chain reaction. Bathum L, Johansson I, Ingelman-Sundberg M, Horder M, Brosen K. Pharmacogenetics 1998: 8(2); 119-23. [Medline] CYP2D6 is a polymorphically expressed enzyme with two phenotypes. Poor metabolizers lack the enzyme caused by inactivating mutations in the CYP2D6 gene and extensive metabolizers have at least one active CYP2D6 gene. Extensive metabolizers with very high capacity for CYP2D6 dependent drug metabolisms are termed ultrarapid metabolizers and carry alleles with duplicated, multi duplicated or amplified CYP2D6 genes. In the present study, we examined the frequency of CYP2D6 gene duplications in a Danish population and validated a long polymerase chain reaction method for identification of ultrarapid metabolizers. Sixty individuals having a metabolic ratio for sparteine at or below 0.15 were selected and a control group of 53 individuals with a metabolic ratio between 0.16 and 12.4 was used. Based on EcoRI restriction fragment length polymorphism analysis, eight individuals were found with a duplicated CYP2D6 gene, whereas using a long polymerase chain reaction method, nine individuals with a 3.6 kb fragment indicative of two CYP2D6 genes in tandem were found among the 60 individuals with a low metabolic ratio. No gene duplication was found in the control group or in any individuals with a metabolic ratio > 0.14. Based on these results, we estimate the frequency of individuals with CYP2D6 duplication in the Danish population to be 0.8%, which is comparable to the frequency in the Swedish and the German populations, but considerably lower than in Spanish or African populations. We conclude that the long polymerase chain reaction assay is simple and reliable for detection of duplications of the CYP2D6 gene.
Cytochrome P450 3A4 activity in premenopausal and postmenopausal women, based on 6-beta-hydroxycortisol:cortisol ratios. Burstein AH, Reiss WG, Kantor E, Anderson GD. Pharmacotherapy 1998: 18(6); 1271-6. [Medline] STUDY OBJECTIVE: To characterize cytochrome P450 (CYP) 3A4 activity in premenopausal and postmenopausal women by evaluating the urinary 6-beta-hydroxycortisol:cortisol ratio. DESIGN: Prospective study SUBJECTS: Thirteen premenopausal and 13 postmenopausal women who were healthy and not receiving drugs known to affect CYP3A4 activity INTERVENTIONS: Beginning on day 2 of menses, premenopausal women collected first morning urine samples every other day for a complete menstrual cycle. Postmenopausal women collected first morning urine every other day for 28 days. MEASUREMENTS AND MAIN RESULTS: Mean weekly 6-beta-hydroxycortisol:cortisol ratios did not differ during the phase (week) of the menstrual cycle. Daily ratios did not differ in postmenopausal women. No difference between premenopausal and postmenopausal women was found on comparing overall median ratios. CONCLUSION: Cytochrome P450 3A4 activity as measured by 6-beta-hydroxy cortisol:cortisol ratio did not differ by week of menstrual cycle, suggesting no menstrual cycle-related changes. Menopause does not appear to be associated with differences in CYP3A4 activity, compared with premenopause.
Intra-individual variability and influence of urine collection period on dextromethorphan metabolic ratios in healthy subjects. Chladek J, Zimova G, Martinkova J, Tuma I. Fundamental Clinical Pharmacology 1999: 13(4); 508-15. [Medline] The aim of the study was to evaluate intra-individual variability in metabolic ratios (MRs) of dextromethorphan (DM) in healthy volunteers and to compare the MRs in urine collected 0-4, 0-8 and 0-24 h post-dose. Urinary molar ratios of DM to dextrorphan (MR1) and of DM to methoxymorphinan (MR2) were obtained after a single oral 27.5 mg dose of DM hydrobromide to ten healthy male and four female Caucasians (ten extensive metabolizers (EM) and four poor metabolizers (PM) of DM) to probe activities of CYP2D6 and CYP3A, respectively. Seven EM and one PM received DM on three additional occasions within 2 months. For the seven EM, the intra-individual variability (CVw) in the MRs obtained in the three urine collections ranged from 11 to 93% (MR1) and from 8 to 77% (MR2). The mean CVw estimated separately for the 4, 8 and 24 h urines by two-way analysis of variance reached 58, 57 and 44% for the MR1 and 50, 42 and 31% for the MR2, respectively. For all 14 subjects, the log-transformed ratios (MR1) obtained in the 24 h urines were highly correlated with those in either the 8 h (rs = 0.967, P < 0.0001) or 4 h urines (rs = 0.946, P < 0.0001). Correlation between the log-transformed MR2s were weaker (24 h vs. 8 h: rs = 0.829, P < 0.0001, 24 h vs. 4 h: rs = 0.831, P < 0.0001). The MR1s in 4 h and 8 h urines were only 2 and 9% less than those in 24 h urines (median differences) and varied from 48 and 47% below to 85 and 55% above (95% -CI for the differences). However, the MR2s in the 4 h and 8 h urines were shifted towards higher values by 49 and 23% and the corresponding 95% -CI limits were: 16-164% (4 h vs. 24 h) and 30-119% (8 h vs. 24 h). In conclusion, MR1 values in the 4 h urine collection agree well with those in longer collections and their use in epidemiological studies can be recommended. The intra-individual variability of approximately 50% in the MR1 has to be taken into account in clinical studies with within-subject design. Accurate determination of the MR2 requires at least a 24 h period of urine collection.
The Orphan Human Pregnane X Receptor Mediates the Transcriptional Activation of CYP3A4 by Rifampicin through a Distal Enhancer Module. Goodwin B, Hodgson E, Liddle C. Molecular Pharmacology 1999: 56(6); 1329-1339. ABSTRACT: Cytochrome P-450 3A4 (CYP3A4), the predominant cytochrome P-450 expressed in adult human liver, is subject to transcriptional induction by a variety of structurally unrelated xenobiotics, including the antibiotic rifampicin. The molecular mechanisms underlying this phenomenon are poorly understood. We transfected a human liver-derived cell line (HepG2) with various CYP3A4-luciferase reporter gene constructs containing a nested set of 5'-deletions of the CYP3A4 5'-flanking region. Rifampicin-inducible transcription of the reporter gene was observed only with the longest construct, which encompassed bases -13000 to +53 of CYP3A4 (3-fold induction). The responsive region was functional regardless of its position or orientation relative to the proximal promoter of CYP3A4 and was capable of conferring rifampicin-inducible expression on a heterologous promoter. Further deletion mutants localized the induction to bases -7836 to -7607. In vitro DNase I footprint analysis of this region revealed four protected sites (FP1, FP2, FP3, and FP4). Two of these sites, FP3 (bases -7738 to -7715) and FP4 (bases -7698 to -7682), overlapped binding motifs for the orphan human pregnane X receptor (hPXR). Cotransfection of responsive constructs with a hPXR expression vector substantially increased the rifampicin-inducibility to approximately 50-fold. In addition, the rifampicin-responsive constructs were strongly activated by a range of CYP3A inducers. Finally, we demonstrate cooperativity between elements within the distal enhancer region and cis-acting elements in the proximal promoter of CYP3A4. Our results provide evidence for the existence of a potent enhancer module, 8 kb distal to the transcription start point, which mediates the transcriptional induction of CYP3A4 by activators of hPXR.
Analysis of the CYP2D6 gene mutations and their consequences for enzyme function in a West African population. Griese EU, Asante-Poku S, Ofori-Adjei D, Mikus G, Eichelbaum M. Pharmacogenetics 1999: 9(6); 715-723. ABSTRACT: The data on differences in the metabolic handling of the CYP2D6 probe drugs sparteine and debrisoquine, and the relationship between phenotype and genotype and gene frequencies for several mutant CYP2D6 alleles in African populations are limited and sometimes controversial. Therefore, in a West African population (Ghana), we investigated (i) the phenotype for sparteine debrisoquine by phenotyping 201 individuals with both drugs and (iii) the genotype for CYP2D6 (n = 326) and debrisoquine (n = 201) oxidation, (ii) the coregulatory control of sparteine and alleles *3 and *4 in 133 individuals and for the alleles *1, *2, *3, *4, *5, *6, *7, *8, *9, *10, *14, *16, *17, *2b, *2xN, *2bxN in 193 individuals. Of the 326 individuals phenotyped with sparteine, eight had a metabolic ratio (MR)sp > 20 corresponding to a poor metabolizer frequency of 2.5% [95% (confidence interval) CI = 1.06-4.77]. The prevalence of the poor metabolizer phenotype for debrisoquine oxidation was 3% (95% CI = 1.1-6.39) with six of the 201 individuals having a MR greater than 12.6. The distribution of the MR of sparteine was trimodal whereas MR of debrisoquine was unimodally distributed with a pronounced kurtosis. In individuals phenotyped with both drugs, there was a significant correlation between the MRs (r(s) = 0.63, P < 0.001). The CYP2D6 alleles *1, *2 and *17 were the most common functional alleles occurring with frequencies of 43.7, 10.6 and 27.7%, respectively. The three other observed functional alleles *2xN, *10 and *20 had much lower frequencies (1.6%, 3.1% and 0.3%, respectively). Of the eight non-functional alleles, only *4 (6.3%) and *5 (6.0%) could be found. The allele *5 occurred with the same frequency as in Caucasian populations (4.1%) but the *4 allele had a much lower frequency (Caucasians 19.5%). One individual with *1/*1 was a poor metabolizer for sparteine and debrisoquine indicating the existence of as yet unknown non-functional alleles in this West African population. Although the prevalence of poor metabolizers and the number of heterozygotes for non-functional alleles was much lower in Ghanaians, the median MRsp of 0.7 was significantly higher in this population compared with a median MRsp of 0.4 in Caucasians, indicating a lower metabolic clearance for CYP2D6 substrates in the West Africans. The lower metabolic activity in Ghanaians could not be explained solely by the high frequency of the *17 allele, which is associated with an impairment of CYP2D6 enzyme function. In addition, a higher median MRsp of 0.5 corresponding to metabolic clearance of 346 ml/min was observed among extensive metabolizers with the genotype *1/*1. Thus, compared with the median of MRsp = 0.28 (CLmet 573 ml/min) in Caucasians homozygous for *1, the metabolic clearance of sparteine was 40% lower on average in respective Ghanaians.
In-vivo phenotyping for CYP3A by a single-point determination of midazolam plasma concentration [In Process Citation]. Lin YS, Lockwood GF, Graham MA, Brian WR, Loi CM, Dobrinska MR, Shen DD, Watkins PB, Wilkinson GR, Kharasch ED, Thummel KE. Pharmacogenetics 2001: 11(9); 781-91. We investigated whether a single plasma midazolam concentration could serve as an accurate predictor of total midazolam clearance, an established in-vivo probe measure of cytochrome P450 3A (CYP3A) activity. In a retrospective analysis of data from 224 healthy volunteers, non-compartmental pharmacokinetic parameters were estimated from plasma concentration-time curves following intravenous (IV) and/or oral administration. Based on statistical moment theory, the concentration at the mean residence time (MRT) should be the best predictor of the total area under the curve (AUC). Following IV or oral midazolam administration, the average MRT was found to be approximately 3.5 h, suggesting that the optimal single sampling time to predict AUC was between 3 and 4 h. Since a 4-h data point was common to all studies incorporated into this analysis, we selected this time point for further investigation. The concentrations of midazolam measured 4 h after an IV or oral dose explained 80 and 91% of the constitutive interindividual variability in midazolam AUC, respectively. The 4-h midazolam measurement was also an excellent predictor of drug-drug interactions involving CYP3A induction and inhibition. Compared with baseline values, the direction and magnitude of change in midazolam AUC and the 4-h concentration were completely concordant for all study subjects. We conclude that a single 4-h midazolam concentration following IV or oral administration represents an accurate marker of CYP3A phenotype under constitutive and modified states. Moreover, the single-point approach offers an efficient means to phenotype and identify individuals with important genetic polymorphisms that affect CYP3A activity.
Polymorphism of the cytochrome P450 CYP2D6 gene in a European population: characterization of 48 mutations and 53 alleles, their frequencies and evolution. Marez D, Legrand M, Sabbagh N, Guidice JM, Spire C, Lafitte JJ, Meyer UA, Broly F. Pharmacogenetics 1997: 7(3); 193-202. [Medline] The polymorphic cytochrome P450 CYP2D6 is involved in the metabolism of various drugs of wide therapeutic use and is a presumed susceptibility factor for certain environmentally-induced diseases. Our aim was to define the mutations and alleles of the CYP2D6 gene and to evaluate their frequencies in the European population. Using polymerase chain reaction-single strand conformation polymorphism analysis, 672 unrelated subjects were screened for mutations in the 9 exons of the gene and their exon-intron boundaries. A total of 48 point mutations were identified, of which 29 were novel. Mutations 1749 G-->C, 2938 C-->T and 4268 G-->C represented 52.6%, 34.3% and 52.9% of the mutations in the total population, respectively. Of the eight detrimental mutations detected, the 1934 G-->A, the 1795 Tdel and the 2637 Adel accounted for 65.8%, 6.2% and 4.8% respectively, within the poor metabolizer subgroup. Fifty-three different alleles were characterized from the mutation pattern and by allele-specific sequencing. They are derived from three major alleles, namely the wild-type CYP2D6*1A, the functional CYP2D6*2 and the null CYP2D6*4A. Five allelic variants (CYP2D6*1A, *2, *2B, *4A and *5) account for about 87% of all alleles, while the remaining alleles occur with a frequency of 0.1%-2.7%. These data provide a solid basis for future epidemiological, clinical as well as interethnic studies of the CYP2D6 polymorphism and highlight that the described single strand conformation polymorphism method can be successfully used in designing such studies.
Pharmacokinetics of zidovudine in infants: a population analysis across studies. Mirochnick M, Capparelli E, Connor J. Clin Pharmacol Ther 1999: 66(1); 16-24. BACKGROUND: Although the use of zidovudine in newborns and infants has become standard therapy for prophylaxis and therapy of human immunodeficiency virus infection, the developmental pharmacology of zidovudine in the first months of life has not been fully described. METHODS: We used population analysis to estimate zidovudine pharmacokinetic parameters in newborns and infants who either participated in one of five Pediatric AIDS Clinical Trials Group (PACTG) protocols or were premature infants who had zidovudine concentrations drawn for therapeutic drug monitoring. The data set consisted of 698 serum samples from 83 infants with a mean gestational age at birth of 37.5 weeks (range, 26.0 to 41.5 weeks), mean postnatal age at sampling of 19.3 days (range, 0 to 144 days), and a mean weight at sampling of 3.1 kg (range, 0.71 to 6.0 kg). With use of the program NONMEM and a two-compartment open model, the influences of demographic and clinical factors on the elimination rate constant (k10), volume of distribution of the central compartment (Vc) and bioavailability (F1) were examined. RESULTS: Zidovudine elimination was slow immediately after birth but increased rapidly in term infants during the first weeks of life, reaching a plateau by 4 to 8 weeks of age. In premature infants, zidovudine elimination increased at a much slower rate than in the term infants. Gender, race, and exposure to didanosine or nevirapine had no impact on zidovudine elimination. Bioavailability was increased in infants less than 14 days old. CONCLUSIONS: Zidovudine elimination kinetics undergo large increases during the first months of life, and the pattern of maturation is different in term and preterm infants. Higher bioavailability in younger infants is consistent with decreased first-pass metabolism associated with reduced clearance.
Canadian Native Indians exhibit unique CYP2A6 and CYP2C19 mutant allele frequencies. Nowak MP, Sellers EM, Tyndale RF. Clinical Pharmacology & Therapeutics 1998: 64(4); 378-383. ABSTRACT: Many human cytochromes P450 (CYP) enzymes are polymorphically expressed, resulting in interindividual and interethnic differences in the metabolism of substrate drugs. Little is known about the genetic variation of CYP enzymes in Canadian Native Indians. We therefore determined the CYP2C19 and CYP2A6 mutant allele frequencies in 159 Canadian Native Indians and compared them with white and Asian subjects. Canadian Native Indians differed significantly from both white and Asian populations in allelic patterns of both CYP2C19 (19.1% CYP2C19*2 and 0% CYP2C19*3) and CYP2A6 (0.9% CYP2A6*2 and 13.9% CYP2A6*3). In addition, analysis of the Canadian Native Indian data suggested that there may be an association between the presence of the CYP2C19 and CYP2A6 mutant alleles such that the co-occurrence of these 2 alleles is higher than would be predicted on the basis of their individual frequencies in this population.
Cytochrome P450 2D6 variants in a Caucasian population: allele frequencies and phenotypic consequences. Sachse C, Brockmoller J, Bauer S, Roots I. American Journal of Human Genetics 1997: 60(2); 284-95. [Medline] Cytochrome P450 2D6 (CYP2D6) metabolizes many important drugs. CYP2D6 activity ranges from complete deficiency to ultrafast metabolism, depending on at least 16 different known alleles. Their frequencies were determined in 589 unrelated German volunteers and correlated with enzyme activity measured by phenotyping with dextromethorphan or debrisoquine. For genotyping, nested PCR-RFLP tests from a PCR amplificate of the entire CYP2D6 gene were developed. The frequency of the CYP2D6*1 allele coding for extensive metabolizer (EM) phenotype was.364. The alleles coding for slightly (CYP2D6*2) or moderately (*9 and *10) reduced activity (intermediate metabolizer phenotype [IM]) showed frequencies of.324.018, and.015, respectively. By use of novel PCR tests for discrimination, CYP2D6 gene duplication alleles were found with frequencies of.005 (*1x2).013 (*2x2), and.001 (*4x2). Frequencies of alleles with complete deficiency (poor metabolizer phenotype [PM]) were.207 (*4).020 (*3 and *5).009 (*6), and.001 (*7, *15, and *16). The defective CYP2D6 alleles *8, *11, *12, *13, and *14 were not found. All 41 PMs (7.0%) in this sample were explained by five mutations detected by four PCR-RFLP tests, which may suffice, together with the gene duplication test, for clinical prediction of CYP2D6 capacity. Three novel variants of known CYP2D6 alleles were discovered: *1C (T1957C), *2B (additional C2558T), and *4E (additional C2938T). Analysis of variance showed significant differences in enzymatic activity measured by the dextromethorphan metabolic ratio (MR) between carriers of EM/PM (mean MR =.006) and IM/PM (mean MR =.014) alleles and between carriers of one (mean MR =.009) and two (mean MR =.003) functional alleles. The results of this study provide a solid basis for prediction of CYP2D6 capacity, as required in drug research and routine drug treatment.
Development of a Pharmacophore for Inhibition of Human Liver Cytochrome P-450 2D6: Molecular Modeling and Inhibition Studies. Strobl GR, von Kruedener S, Stockigt J, Guengerich F, Wolff T. J Med Chem 1993: 36(9); 1136-1145. [Medline] ABSTRACT: To gain insight into the specificity of cytochrome P-450 2D6 toward inhibitors, a preliminary pharmacophore model was built up using strong competitive inhibitors. Ajmalicine (1), the strongest inhibitor known (Ki = 3 nM) was selected as template because of its rigid structure. The preliminary pharmacophore model was validated by performing inhibition studies with derivatives of ajmalicine (1) and quinidine (9). Bufuralol (18) was chosen as substrate and the metabolite 1'-hydroxybufuralol (19) was separated by high performance liquid chromatography. All incubations were carried out using human liver microsomes after demonstration that the Ki values obtained with microsomes were in accordance with those obtained with a reconstituted monooxygenase system containing purified cytochrome P-450 2D6. Large differences of Ki values ranging between 0.005 and 100 microM were observed. Low-energy conformers of tested compounds were fit within the preliminary pharmacophore model. The analysis of steric and electronic properties of these compounds led to the definition of a final pharmacophore model. Characteristic properties are a positive charge on a nitrogen atom and a flat hydrophobic region, the plane of which is almost perpendicular to the N-H axis and maximally extends up to a distance of 7.5 A from the nitrogen atom. Compounds with high inhibitory potency had additional functional groups with negative molecular electrostatic potential and hydrogen bond acceptor properties on the opposite side at respective distances of 4.8-5.5 A and 6.6-7.5 A from the nitrogen atom. The superposition of strong and weak inhibitors led to the definition of an excluded volume map. Compounds that required additional space were not inhibitors. This is apparently the first pharmacophore model for inhibitors of a cytochrome P-450 enzyme and offers the opportunity to classify compounds according to their potency of inhibition. Adverse drug interactions which occur when both substrates and inhibitors of cytochrome P-450 2D6 are applied may be predicted.
Postmenopausal hormone replacement therapy and the primary prevention of cardiovascular disease. Humphrey LL, Chan BK, Sox HC. Ann Intern Med 2002: 137(4); 273-84. [Medline] PURPOSE: To evaluate the value of hormone replacement therapy (HRT) in the primary prevention of cardiovascular disease (CVD) and coronary artery disease (CAD). DATA SOURCES: MEDLINE and Cochrane databases were searched for all primary prevention studies reporting CVD or CAD incidence, mortality, or both in association with HRT; reference lists, letters, editorials, and reviews were also reviewed. DATA EXTRACTION: All studies were reviewed, abstracted, and rated for quality. STUDY SELECTION: Only studies of good or fair quality, according to U.S. Preventive Services Task Force (USPSTF) criteria, were included in the detailed review and meta-analysis. DATA SYNTHESIS: The summary relative risk with any HRT use was 0.75 (95% credible interval [CrI], 0.42 to 1.23) for CVD mortality and 0.74 (CrI, 0.36 to 1.45) for CAD mortality. The summary relative risk with any use was 1.28 (CrI, 0.86 to 2.00) for CVD incidence and 0.87 (CrI, 0.62 to 1.21) for CAD incidence. Further analysis of studies adjusting for socioeconomic status, as well as other major CAD risk factors, showed a summary relative risk of 1.07 (CrI, 0.79 to 1.48) for CAD incidence associated with any HRT use. Similar results were found when the analysis was stratified by studies adjusting for alcohol consumption, exercise, or both, in addition to other major risk factors, suggesting confounding by these factors. CONCLUSIONS: This meta-analysis differs from previous meta-analyses by evaluating potential explanatory variables of the relationship between HRT, CVD, and CAD. The adjusted meta-analysis is consistent with recent randomized trials that have shown no benefit in the secondary or primary prevention of CVD events. A valid answer to the role of HRT in the primary prevention of CVD will best come from randomized, controlled trials.
Smoking and female infertility: a systematic review and meta-analysis. Augood C, Duckitt K, Templeton AA. Human Reproduction 1998: 13(6); 1532-9. [Medline] The high prevalence of smoking among women in their reproductive years continues to be a matter of concern. The negative effects of smoking on general health are well known, but smoking may also affect fertility. The objective of the present study was to perform a systematic review of the literature to determine whether there is an association between smoking and risk of infertility in women of reproductive age, and to assess the size of this effect. In the 12 studies used for this meta-analysis, the overall value of the odds ratio (OR) for risk of infertility in women smokers versus non-smokers was 1.60 [95% confidence interval (CI) 1.34-1.91]. Studies of subfertile women undergoing in-vitro fertilization (IVF) treatment also show a reduction in fecundity among women smokers. A meta-analysis of nine studies found an OR of 0.66 (95% CI 0.49-0.88) for pregnancies per number of IVF-treated cycles in smokers versus non-smokers. Despite the potential limitations of meta-analyses of observational studies, the evidence presented in this review is compelling because of the consistency of effect across different study designs, sample size and types of outcome. However, continued reassurance is needed that the calculated overall effect is not in fact due to confounding variables.
Racial and ethnic differences in serum cotinine levels of cigarette smokers: Third National Health and Nutrition Examination Survey, 1988-1991. Caraballo RS, Giovino GA, Pechacek TF, Mowery PD, Richter PA, Strauss WJ, Sharp DJ, Eriksen MP, Pirkle JL, Maurer KR. Jama 1998: 280(2); 135-9. CONTEXT: Cotinine, a metabolite of nicotine, is a marker of exposure to tobacco smoke. Previous studies suggest that non-Hispanic blacks have higher levels of serum cotinine than non-Hispanic whites who report similar levels of cigarette smoking. OBJECTIVE: To investigate differences in levels of serum cotinine in black, white, and Mexican American cigarette smokers in the US adult population. DESIGN: Third National Health and Nutrition Examination Survey, 1988-1991. PARTICIPANTS: A nationally representative sample of persons aged 17 years or older who participated in the survey. OUTCOME MEASURES: Serum cotinine levels by reported number of cigarettes smoked per day and by race and ethnicity. RESULTS: A total of 7182 subjects were involved in the study; 2136 subjects reported smoking at least 1 cigarette in the last 5 days. Black smokers had cotinine concentrations substantially higher at all levels of cigarette smoking than did white or Mexican American smokers (P<.001). Serum cotinine levels for blacks were 125 nmol/L (22 ng/mL) (95% confidence interval [CI], 79-176 nmol/L [14-31 ng/mL]) to 539 nmol/L (95 ng/mL) (95% CI, 289-630 nmol/L [51-111 ng/mL]) higher than for whites and 136 nmol/L (24 ng/mL) (95% CI, 85-182 nmol/L [15-32 ng/mL]) to 641 nmol/L (113 ng/mL) (95% CI, 386-897 nmol/L [68-158 ng/mL]) higher than for Mexican Americans. These differences do not appear to be attributable to differences in environmental tobacco smoke exposure or in number of cigarettes smoked. CONCLUSIONS: To our knowledge, this study provides the first evidence from a national study that serum cotinine levels are higher among black smokers than among white or Mexican American smokers. If higher cotinine levels among blacks indicate higher nicotine intake or differential pharmacokinetics and possibly serve as a marker of higher exposure to cigarette carcinogenic components, they may help explain why blacks find it harder to quit and are more likely to experience higher rates of lung cancer than white smokers.
Use of systematic reviews in clinical practice guidelines: case study of smoking cessation. Silagy CA, Stead LF, Lancaster T. Bmj 2001: 323(7317); 833-6. OBJECTIVE: To examine the extent to which recommendations in the national guidelines for the cessation of smoking are based on evidence from systematic reviews of controlled trials. DESIGN: Retrospective analysis of recommendations for the national guidelines for the cessation of smoking. MATERIALS: National guidelines in clinical practice on smoking cessation published in English. MAIN OUTCOME MEASURES: The type of evidence (systematic review of controlled trials, individual trials, other studies, expert opinion) used to support each recommendation. We also assessed whether a Cochrane systematic review was available and could have been used in formulating the recommendation. RESULTS: Four national smoking cessation guidelines (from Canada, New Zealand, the United Kingdom, and the United States) covering 105 recommendations were identified. An explicit evidence base for 100%, 89%, 68%, and 98% of recommendations, respectively, was detected, of which 60%, 56%, 59%, and 47% were based on systematic reviews of controlled studies. Cochrane systematic reviews could have been used to develop between 39% and 73% of recommendations but were actually used in 0% to 36% of recommendations. The UK guidelines had the highest proportion of recommendations based on Cochrane systematic reviews. CONCLUSIONS: Use of systematic reviews in guidelines is a measure of the "payback" on investment in research synthesis. Systematic reviews commonly underpinned recommendations in guidelines on smoking cessation. The extent to which they were used varied by country and there was evidence of duplication of effort in some areas. Greater international collaboration in developing and maintaining an evidence base of systematic reviews can improve the efficiency of use of research resources.
The relationship between methodological quality and conclusions in reviews of spinal manipulation. Assendelft WJ, Koes BW, Knipschild PG, Bouter LM. Jama 1995: 274(24); 1942-8. [Medline] OBJECTIVE--To study the relationship between the methodological quality and other characteristics of reviews of spinal manipulation for low back pain on the one hand and the reviewers' conclusions on the effectiveness of manipulation on the other hand. DATA SOURCES--Reviews identified by MEDLINE search, citation tracking, Library search, and correspondence with experts. STUDY SELECTION--English- or Dutch-language reviews published up to 1993 dealing with spinal manipulation for low back pain that include at least two randomized clinical trials (RCTs). DATA EXTRACTION--Methodological quality was assessed using a standardized criteria list applied independently by two assessors (range, 0% to 100%). Other extracted characteristics were the comprehensiveness of the search, selective citation of studies, language, inclusion of non-RCTs, type of publication, reviewers' professional backgrounds, and publication in a spinal manipulation journal or book. The reviewers' conclusions were classified as negative, neutral, or positive. DATA SYNTHESIS--A total of 51 reviews were assessed, 17 of which were neutral and 34 positive. The methodological quality was low, with a median score of 23%. Nine of the 10 methodologically best reviews were positive. Other factors associated with a positive reviewers' conclusion were review of spinal manipulation only, inclusion of a spinal manipulator in the review team, and a comprehensive literature search. CONCLUSIONS--The majority of the reviews concluded that spinal manipulation is an effective treatment for low back pain. Although, in particular, the reviews with a relatively high methodological quality had a positive conclusion, strong conclusions were precluded by the overall low quality of the reviews. More empirical research on the review methods applied to other therapies in other professional fields is needed to further explore our findings about the factors related to a positive reviewers' conclusion.
Meta-analysis--"does one bad apple spoil the barrel?" Coste J, Bouyer J, Job-Spira N. Fertility and Sterility 1997: 67(4); 791-792. [Medline]
Spurious precision? Meta-analysis of observational studies. Egger M, Schneider M, Davey Smith G. British Medical Journal 1998: 316(7125); 140-4. [Medline] [Full text] In previous articles we have focused on the potentials, principles, and pitfalls of meta-analysis of randomised controlled trials. Meta-analysis of observational data is, however, also becoming common. In a Medline search we identified 566 articles (excluding those published as letters) published in 1995 and indexed with the medical subject heading (MeSH) term "meta-analysis." We randomly selected 100 of these articles and examined them further. Sixty articles reported on actual meta-analyses, and 40 were methodological papers, editorials, and traditional reviews (1). Among the meta-analyses, about half were based on observational studies, mainly cohort and case-control studies of medical interventions or aetiological associations.
A new system for grading recommendations in evidence based guidelines. Harbour R, Miller J. British Medical Journal 2001: 323(7308); 334-6. [Medline] [Full text] [PDF]
Systematic reviews in health care: Assessing the quality of controlled clinical trials. Juni P, Altman DG, Egger M. British Medical Journal 2001: 323(7303); 42-6. [Medline] [Full text] [PDF]
The importance of quality of primary studies in producing unbiased systematic reviews. Khan KS, Daya S, Jadad A. Arch Intern Med 1996: 156(6); 661-6. BACKGROUND: Traditional and largely qualitative reviews of evidence are now giving way to much more structured systematic overviews that use a quantitative method to calculate the overall effect of treatment. The latter approach is dependent on the quality of primary studies, which may introduce bias if they are of poor methodologic quality. OBJECTIVE: To test the hypothesis that the inclusion of poor-quality trials in meta-analyses would bias the conclusions and produce incorrect estimates of treatment effect. METHODS: An overview of randomized trials of antiestrogen therapy in subfertile men with oligospermia was performed to test the hypothesis. Data sources included online searching of MEDLINE and Science Citation Index databases between 1966 and 1994, scanning the bibliography of known primary studies and review articles, and contacting experts in the field. After independent, blind assessment, nine of 149 originally identified studies met the inclusion criteria and were selected. We assessed study quality independently. Outcome data from each study were pooled and statistically summarized. RESULTS: There was a marginal improvement in pregnancy rate with antiestrogen treatment (odds ratio, 1.6; 95% confidence interval, 0.9 to 2.6). Sensitivity analyses on the basis of methodologic quality demonstrated that poor-quality studies produced a positive effect with treatment, whereas no benefit was observed with high-quality studies. CONCLUSION: The results of a meta-analysis are influenced by the quality of the primary studies included. Methodologically, poor studies tend to exaggerate the overall estimate of treatment effect and may lead to incorrect inferences.
Does quality of reports of randomized trials affect estimates of intervention efficacy reported in meta-analyses? Moher D, Pham B, Jones A, Cook D, Jadad A, Moher M, Tugwell P, Klassen T. Lancet 1998: 352; 609 - 613.
Quality, evolution and clinical implications of randomized controlled trials on the treatment of lung cancer. A lost opportunity for meta-analysis. Nicolucci A, Grilli R, Alexanian A, Apolone G, Torri V, Liberati A. JAMA 1989: 262; 2101 - 2107. A review of 150 published randomized trials on the treatment of lung cancer showed serious methodological drawbacks. Handling of withdrawals (only 7 trials had no dropouts), a priori estimates of sample size (only 9 trials specified the required number of patients), blinding of randomization (only 22 trials had a satisfactory procedure), and information on eligible nonrandomized patients (only 13 studies reported it precisely) were areas of major concern. Although trial quality improved over time both in design/execution (study size estimation and analysis by prognostic factors became more frequent) and reporting (information on patients' characteristics and side effects were more thoroughly reported), their evolution was inconsistent. For non-small-cell lung cancer-despite the persistent lack of proof of efficacy of any active treatment-an untreated control arm was prematurely abandoned and a wide variety of tested regimens prevailed even in better-quality studies. Slightly more promising is the picture for small-cell lung cancer, where research indicates somewhat more reliable-though limited-progress. While clinical research in lung cancer has contributed little to defining the best standard care, we conclude that its heterogeneity makes it unlikely that quantitative meta-analysis of existing trials will be constructive.
Is meta-analysis a valid approach to the evaluation of small effects in observational studies? Shapiro S. J Clin Epidemiol 1997: 50(3); 223-9. Abstract not available.
Meta-analyses of observational data should be done with due care [letter]. Smith GD, Egger M. Bmj 1999: 318(7175); 56. Abstract not available.
Using evidence from different sources: an example using paracetamol 1000 mg plus codeine 60 mg. Smith LA, Moore RA, McQuay HJ, Gavaghan D. BMC Med Res Methodol 2001: 1(1); 1. [Medline] [Abstract] [Full text] [PDF] BACKGROUND: Meta-analysis usually restricts the information pooled, for instance using only randomised, double-blind, placebo-controlled trials. This neglects other types of high quality information. This review explores using different information for the combination of paracetamol 1000 mg and codeine 60 mg in acute postoperative pain. RESULTS: Randomised, double-blind, placebo-controlled trials of paracetamol 1000 mg and codeine 60 mg had an NNT of 2.2 (95% confidence interval 1.7 to 2.9) for at least 50% pain relief over four to six hours in three trials with 197 patients. Computer simulation of randomised trials demonstrated 92% confidence that the simulated NNT was within +/- 0.5 of the underlying value of 2.2 with this number of patients. The result was supported a rational dose-response relationship for different doses of paracetamol and codeine in 17 additional trials with 1,195 patients. Three controlled trials lacking a placebo and with 117 patients treated with of paracetamol 1000 mg and codeine 60 mg had 73% (95%CI 56% to 81%) of patients with at least 50% pain relief, compared with 57% (48% to 66%) in placebo controlled trials. Six trials in acute pain were omitted because of design issues, like the use of different pain measures or multiple dosing regimens. In each paracetamol 1000 mg and codeine 60 mg was shown to be better than placebo or comparators for at least one measure. CONCLUSIONS: Different designs of high quality trials can be used to support limited information used in meta-analysis without recourse to low quality trials that might be biased.
Effect of non-steroidal anti-inflammatory drugs on risk of Alzheimer's disease: systematic review and meta-analysis of observational studies. Etminan M, Gill S, Samii A. Bmj 2003: 327(7407); 128. [Medline] [Abstract] [Full text] [PDF] OBJECTIVES: To quantify the risk of Alzheimer's disease in users of all non-steroidal anti-inflammatory drugs (NSAIDs) and users of aspirin and to determine any influence of duration of use. DESIGN: Systematic review and meta-analysis of observational studies published between 1966 and October 2002 that examined the role of NSAID use in preventing Alzheimer's disease. Studies identified through Medline, Embase, International Pharmaceutical Abstracts, and the Cochrane Library. RESULTS: Nine studies looked at all NSAIDs in adults aged > 55 years. Six were cohort studies (total of 13 211 participants), and three were case-control studies (1443 participants). The pooled relative risk of Alzheimer's disease among users of NSAIDs was 0.72 (95% confidence interval 0.56 to 0.94). The risk was 0.95 (0.70 to 1.29) among short term users (< 1 month) and 0.83 (0.65 to 1.06) and 0.27 (0.13 to 0.58) among intermediate term (mostly < 24 months) and long term (mostly > 24 months) users, respectively. The pooled relative risk in the eight studies of aspirin users was 0.87 (0.70 to 1.07). CONCLUSIONS: NSAIDs offer some protection against the development of Alzheimer's disease. The appropriate dosage and duration of drug use and the ratios of risk to benefit are still unclear.
Effectiveness of treatments for infantile colic: systematic review. Lucassen PL, Assendelft WJ, Gubbels JW, van Eijk JT, van Geldrop WJ, Neven AK. Bmj 1998: 316(7144); 1563-9. [Medline] [Abstract] [Full text] [PDF] OBJECTIVE: To evaluate the effectiveness of diets, drug treatment, and behavioural interventions on infantile colic in trials with crying or the presence of colic as the primary outcome measure. DATA SOURCES: Controlled clinical trials identified by a highly sensitive search strategy in Medline (1966-96), Embase (1986-95), and the Cochrane Controlled Trials Register, in combination with reference checking for further relevant publications. Keywords were crying and colic. STUDY SELECTION: Two independent assessors selected controlled trials with interventions lasting at least 3 days that included infants younger than 6 months who cried excessively. DATA SYNTHESIS: Methodological quality was assessed by two assessors independently with a quality assessment scale (range 0-5). Effect sizes were calculated as percentage success. Effect sizes of trials using identical interventions were pooled using a random effects model. RESULTS: 27 controlled trials were identified. Elimination of cows' milk protein was effective when substituted by hypoallergenic formula milks (effect size 0.22 (95% confidence interval 0.09 to 0.34)). The effectiveness of substitution by soy formula milks was unclear when only trials of good methodological quality were considered. The benefit of eliminating cows' milk protein was not restricted to highly selected populations. Dicyclomine was effective (effect size 0.46 (0.33 to 0.60)), but serious side effects have been reported. The advice to reduce stimulation was beneficial (effect size 0.48 (0.23 to 0.74)), whereas the advice to increase carrying and holding seemed not to reduce crying. No benefit was shown for simethicone. Uncertainty remained about the effectiveness of low lactose formula milks. CONCLUSIONS: Infantile colic should preferably be treated by advising carers to reduce stimulation and with a one week trial of a hypoallergenic formula milk.
Impact of study quality on outcome in placebo-controlled trials of homeopathy. Linde K. J Clin Epidemiol 1999: 52(7); 631-36. Abstract not available yet.
Row Over Breast Cancer Screening Shows that Scientists Bring "Some Subjectivity Into Their Work. Mayor S. British Medical Journal 2001: 323(7319); 956. [Medline] [Full text] [PDF]
Does blinding of readers affect the results of meta-analyses? Berlin JA. The Lancet 1997: 350(9072); 185-6. [Medline]
Assessing the quality of reports of randomized controlled trials: is blinding necessary? Jadad A, Moore A, Carroll D, Jenkinson C, Reynolds D, Gavaghan D, McQuay H. Control Clin Trials 1996: 17(1); 1 - 12. [Medline] It has been suggested that the quality of clinical trials should be assessed by blinded raters to limit the risk of introducing bias into meta-analyses and systematic reviews, and into the peer-review process. There is very little evidence in the literature to substantiate this. This study describes the development of an instrument to assess the quality of reports of randomized clinical trials (RCTs) in pain research and its use to determine the effect of rater blinding on the assessments of quality. A multidisciplinary panel of six judges produced an initial version of the instrument. Fourteen raters from three different backgrounds assessed the quality of 36 research reports in pain research, selected from three different samples. Seven were allocated randomly to perform the assessments under blind conditions. The final version of the instrument included three items. These items were scored consistently by all the raters regardless of background and could discriminate between reports from the different samples. Blind assessments produced significantly lower and more consistent scores than open assessments. The implications of this finding for systematic reviews, meta-analytic research and the peer-review process are discussed.
"Large-scale randomized evidence: large, simple trials and overviews of trials": discussion. A clinician's perspective on meta-analyses. Horwitz R. J Clin Epidemio 1995: 48(1); 41-44. [Medline] No abstract available.
Issues in Comparisons between Meta-analyses and Large Trials. Ioannidis J, Cappelleri J, Lau J. Jama 2002: 279(14); 1089-93. [Medline] CONTEXT: The extent of concordance between meta-analyses and large trials on the same topic has been investigated with different protocols. Inconsistent conclusions created confusion regarding the validity of these major tools of clinical evidence. OBJECTIVE: To evaluate protocols comparing meta-analyses and large trials in order to understand if and why they disagree on the concordance of these 2 clinical research methods. DESIGN: Systematic comparison of protocol designs, study selection, definitions of agreement, analysis methods, and reported discrepancies between large trials and meta-analyses. RESULTS: More discrepancies were claimed when large trials were selected from influential journals (which may prefer trials disagreeing with prior evidence) than from already performed meta-analyses (which may target homogeneous trials) and when both primary and secondary (rather than only primary) end points were considered. Depending on how agreement was defined, kappa coefficients varied from 0.22 (low agreement) to 0.72 (excellent agreement). The correlation of treatment effects between large trials and meta-analyses varied from -0.12 to 0.76, but was more similar (0.50-0.76) when only primary end points were considered. When both the magnitude and uncertainty of treatment effects were considered, large trials disagreed with meta-analyses 10% to 23% of the time. Discrepancies were attributed to different disease risks, variable protocols, quality, and publication bias. CONCLUSIONS: Comparisons of large trials with meta-analyses may reach different conclusions depending on how trials and meta-analyses are selected and how end points and agreement are defined. Scrutiny of these 2 major research methods can enhance our appreciation of both for guiding medical practice.
Erythropoietin, uncertainty principle and cancer related anaemia. Clark O, Adams JR, Bennett CL, Djulbegovic B. BMC Cancer 2002: 2(1); 23. [Medline] [Abstract] [Full text] [PDF] BACKGROUND: This study was designed to evaluate if erythropoietin (EPO) is effective in the treatment of cancer related anemia, and if its effect remains unchanged when data are analyzed according to various clinical and methodological characteristics of the studies. We also wanted to demonstrate that cumulative meta-analysis (CMA) can be used to resolve uncertainty regarding clinical questions. METHODS: Systematic Review (SR) of the published literature on the role of EPO in cancer-related anemia. A cumulative meta-analysis (CMA) using a conservative approach was performed to determine the point in time when uncertainty about the effect of EPO on transfusion-related outcomes could be considered resolved. Participants: Patients included in randomized studies that compared EPO versus no therapy or placebo. Main outcome measures: Number of patients requiring transfusions. RESULTS: Nineteen trials were included. The pooled results indicated a significant effect of EPO in reducing the number of patients requiring transfusions [odds ratio (OR) = 0.41; 95%CI: 0.33 to 0.5; p < 0.00001;relative risk (RR) = 0.61; 95% CI: 0.54 to 0.68]. The results remain unchanged after the sensitivity analyses were performed according to the various clinical and methodological characteristics of the studies. The heterogeneity was less pronounced when OR was used instead of RR as the measure of the summary point estimate. Analysis according to OR was not heterogeneous, but the pooled RR was highly heterogeneous. A stepwise metaregression analysis did point to the possibility that treatment effect could have been exaggerated by inadequacy in allocation concealment and that larger treatment effects are seen at hb level > 11.5 g/dl. We identified 1995 as the point in time when a statistically significant effect of EPO was demonstrated and after which we considered that uncertainty about EPO efficacy was resolved. CONCLUSION: EPO is effective in the treatment of anemia in cancer patients. This could have already been known in 1995 if a CMA had been performed at that time.
Journals: redundant publications are bad news. Mojon-Azzi SM, Jiang X, Wagner U. Nature 2003: 421(6920); 209. [Abstract] [Full text] We have developed an electronic systematic search tool to estimate the amount of duplicate publications in the 70 ophthalmological journals listed by Medline. Our results show that there is a considerable number of duplicate publications. If this holds true for other disciplines, it is bad news for research
Is allergen immunotherapy effective in asthma? A meta-analysis of randomized controlled trials. Abramson MJ, Puy RM, Weiner JM. American Journal of Respiratory and Critical Care Medicine 1995: 151(4); 969-74. A meta-analysis of clinical trials of allergen immunotherapy was undertaken to assess the efficacy of this controversial form of therapy in asthma. A computerized bibliographic search revealed 20 randomized placebo controlled double-blind trials of allergen immunotherapy for asthma. The results extracted included asthmatic symptoms, medication requirements, lung function, and bronchial hyperreactivity (BHR). Categorical outcomes were expressed as odds ratios and continuous outcomes as effect sizes. The combined odds of symptomatic improvement from immunotherapy with any allergen were 3.2 (95% CI 2.2 to 4.9). The odds for reduction in medication after mite immunotherapy were 4.2 (95% CI 2.2 to 7.9). The combined odds for reduction in BHR were 6.8 (95% CI 3.8 to 12.0). The mean effect size for any allergen immunotherapy on all continuous outcomes was 0.71 (95% CI 0.43 to 1.00), which would correspond to a mean 7.1% predicted improvement in FEV1 from immunotherapy. Although the benefits of allergen immunotherapy could be overestimated because of unpublished negative studies, an additional 33 such studies would be necessary to overturn these results. Allergen immunotherapy is a treatment option in highly selected patients with extrinsic ("allergic") asthma.
Collaborative meta-analysis of randomised trials of antiplatelet therapy for prevention of death, myocardial infarction, and stroke in high risk patients. Antithrombotic Trialists' Collaboration. BMJ 2002: 324(7329); 71-86. [Abstract] [Full text] [PDF] Objective: To determine the effects of antiplatelet therapy among patients at high risk of occlusive vascular events. Design: Collaborative meta-analyses (systematic overviews). Inclusion criteria: Randomised trials of an antiplatelet regimen versus control or of one antiplatelet regimen versus another in high risk patients (with acute or previous vascular disease or some other predisposing condition) from which results were available before September 1997. Trials had to use a method of randomisation that precluded prior knowledge of the next treatment to be allocated and comparisons had to be unconfounded[---]that is, have study groups that differed only in terms of antiplatelet regimen. Studies reviewed: 287 studies involving 135 000 patients in comparisons of antiplatelet therapy versus control and 77 000 in comparisons of different antiplatelet regimens. Main outcome measure: "Serious vascular event": non-fatal myocardial infarction, non-fatal stroke, or vascular death. Results: Overall, among these high risk patients, allocation to antiplatelet therapy reduced the combined outcome of any serious vascular event by about one quarter; non-fatal myocardial infarction was reduced by one third, non-fatal stroke by one quarter, and vascular mortality by one sixth (with no apparent adverse effect on other deaths). Absolute reductions in the risk of having a serious vascular event were 36 (SE 5) per 1000 treated for two years among patients with previous myocardial infarction; 38 (5) per 1000 patients treated for one month among patients with acute myocardial infarction; 36 (6) per 1000 treated for two years among those with previous stroke or transient ischaemic attack; 9 (3) per 1000 treated for three weeks among those with acute stroke; and 22 (3) per 1000 treated for two years among other high risk patients (with separately significant results for those with stable angina (P=0.0005), peripheral arterial disease (P=0.004), and atrial fibrillation (P=0.01)). In each of these high risk categories, the absolute benefits substantially outweighed the absolute risks of major extracranial bleeding. Aspirin was the most widely studied antiplatelet drug, with doses of 75-150 mg daily at least as effective as higher daily doses. The effects of doses lower than 75 mg daily were less certain. Clopidogrel reduced serious vascular events by 10% (4%) compared with aspirin, which was similar to the 12% (7%) reduction observed with its analogue ticlopidine. Addition of dipyridamole to aspirin produced no significant further reduction in vascular events compared with aspirin alone. Among patients at high risk of immediate coronary occlusion, short term addition of an intravenous glycoprotein IIb/IIIa antagonist to aspirin prevented a further 20 (4) vascular events per 1000 (P<0.0001) but caused 23 major (but rarely fatal) extracranial bleeds per 1000. Conclusions: Aspirin (or another oral antiplatelet drug) is protective in most types of patient at increased risk of occlusive vascular events, including those with an acute myocardial infarction or ischaemic stroke, unstable or stable angina, previous myocardial infarction, stroke or cerebral ischaemia, peripheral arterial disease, or atrial fibrillation. Low dose aspirin (75-150 mg daily) is an effective antiplatelet regimen for long term use, but in acute settings an initial loading dose of at least 150 mg aspirin may be required. Adding a second antiplatelet drug to aspirin may produce additional benefits in some clinical circumstances, but more research into this strategy is needed.
The effectiveness of chiropractic for treatment of low back pain: an update and attempt at statistical pooling. Assendelft WJ, Koes BW, van der Heijden GJ, Bouter LM. Journal of Manipulative and Physiological Therapeutics 1996: 19(8); 499-507. [Medline] OBJECTIVE: To determine the effectiveness of chiropractic treatment for patients with low back pain by means of a systematic review of the literature. DATA SOURCES: Randomized clinical trials (RCTs) on chiropractic were identified with a Medline and Embase search (1966-1995), by citation tracking, and by hand searching of the relevant chiropractic reference systems (CRAC and Index to Chiropractic Literature). STUDY SELECTION: All RCTs on low back pain that involved chiropractors as therapists. DATA EXTRACTION: Methodological quality was assessed independently by two reviewers on 14 items covering internal validity, informativeness and study size. Data were extracted on: patients (initial referral, duration of complaints, radiation of pain); outcomes (four different types); and timing of follow-up (short-term, intermediate and long-term). Statistical pooling was intended, according to a preset analysis plan, to include subgroup analysis. DATA SYNTHESIS: Eight RCTs were identified. All RCTs had serious flaws in their design, execution and reporting. Because of the great variety of outcome measures and follow-up timing, there was insufficient data to enable statistical pooling of the RCTs. A narrative review, however, did not provide convincing evidence for the effectiveness of chiropractic for acute or chronic low back pain. CONCLUSIONS: There is certainly a need for correctly executed trials. In future research on the effectiveness of chiropractic, guidelines for uniform execution and reporting of RCTs should first be established to enable subsequent statistical pooling in systematic reviews of chiropractic trials.
The efficacy of "distant healing": a systematic review of randomized trials. Astin JA, Harkness E, Ernst E. Annals of Internal Medicine 2000: 132(11); 903-10. [Medline] PURPOSE: To conduct a systematic review of the available data on the efficacy of any form of "distant healing" (prayer, mental healing, Therapeutic Touch, or spiritual healing) as treatment for any medical condition. DATA SOURCES: Studies were identified by an electronic search of the MEDLINE, PsychLIT, EMBASE, CISCOM, and Cochrane Library databases from their inception to the end of 1999 and by contact with researchers in the field. STUDY SELECTION: Studies with the following features were included: random assignment, placebo or other adequate control, publication in peer-reviewed journals, clinical (rather than experimental) investigations, and use of human participants. DATA EXTRACTION: Two investigators independently extracted data on study design, sample size, type of intervention, type of control, direction of effect (supporting or refuting the hypothesis), and nature of the outcomes. DATA SYNTHESIS: A total of 23 trials involving 2774 patients met the inclusion criteria and were analyzed. Heterogeneity of the studies precluded a formal meta-analysis. Of the trials, 5 examined prayer as the distant healing intervention, 11 assessed noncontact Therapeutic Touch, and 7 examined other forms of distant healing. Of the 23 studies, 13 (57%) yielded statistically significant treatment effects, 9 showed no effect over control interventions, and 1 showed a negative effect. CONCLUSIONS: The methodologic limitations of several studies make it difficult to draw definitive conclusions about the efficacy of distant healing. However, given that approximately 57% of trials showed a positive treatment effect, the evidence thus far merits further study.
Decline in semen quality among fertile men in Paris during the past 20 years. Auger J, Kunstmann JM, Czyglik F, Jouannet P. New England Journal of Medicine 1995: 332(5); 281-5. [Medline] BACKGROUND. Several studies have suggested a population-wide decline in the quality of semen over the past 50 years, but clear evidence for decreasing semen quality in recent decades is lacking. METHODS. From 1973 through 1992 we measured the volume of seminal fluid, the sperm concentration, and the percentages of motile and morphologically normal spermatozoa in 1351 healthy fertile men. The data on the semen samples were collected at one sperm bank in Paris. The data in each calendar year were analyzed as a function of the year of donation, the age of each patient, the year of birth, and the duration of sexual abstinence before semen collection. RESULTS. There was no change in semen volume during the study period. The mean concentration of sperm decreased by 2.1 percent per year, from 89 x 10(6) per milliliter in 1973 to 60 x 10(6) per milliliter in 1992 (P < 0.001). During the same period the percentages of motile and normal spermatozoa decreased by 0.6 percent and 0.5 percent per year, respectively (both P < 0.001). After adjustment in multiple regression analyses for age and the duration of sexual abstinence, each successive calendar year of birth accounted for 2.6 percent of the yearly decline in the sperm concentration and for 0.3 percent and 0.7 percent, respectively, of the yearly declines in the percentages of motile and normal spermatozoa (all P < 0.001). CONCLUSIONS. During the past 20 years, there has been a decline in the concentration and motility of sperm and in the percentage of morphologically normal spermatozoa in fertile men that is independent of the age of the men.
Diesel exhaust exposure and lung cancer. Bhatia R, Lopipero P, Smith AH. Epidemiology 1998: 9(1); 84-91. [Medline] We evaluated the relation between occupational exposure to diesel exhaust and cancer of the lung in a meta-analysis of 29 published cohort and case-control studies. Twenty-one of the 23 studies meeting the inclusion criteria had observed relative risk estimates greater than one. Pooled effect measures weighted by study precision indicated an increased relative risk (RR) for lung cancer from occupational exposure to diesel exhaust [RR = 1.33; 95% confidence interval (CI) = 1.24-1.44]. Subanalysis of case-control (RR = 1.33; 95% CI = 1.18-1.51) vs cohort studies (RR = 1.33; 95% CI = 1.21-1.47) and of studies that controlled for smoking (RR = 1.35; 95% CI = 1.20-1.52) vs those that did not (RR = 1.33; 95% CI = 1.20-1.47) produced results that did not differ from those of the overall analysis. On the other hand, cohort studies using internal comparisons (RR = 1.43; 95% CI = 1.29-1.58) showed higher relative risks than those using external comparisons (RR = 1.22; 95% CI = 1.04-1.44). Heterogeneity between studies was reduced when we stratified studies by the occupational setting in which exposure occurred. A positive duration-response relation was evident in those studies that were stratified by employment duration. This meta-analysis supports a causal association between increased risks for lung cancer and exposure to diesel exhaust.
Angiotensin converting enzyme insertion or deletion polymorphism and coronary restenosis: meta-analysis of 16 studies. Bonnici F, Keavney B, Collins R, Danesh J. British Medical Journal 2002: 325(7363); 517-20. [Medline] OBJECTIVE: To assess the association between genotype at the insertion or deletion polymorphism of the angiotensin converting enzyme gene and risk of coronary restenosis after percutaneous coronary intervention. DESIGN: Meta-analysis of studies before July 2001 that reported on these genotypes and risk of coronary restenosis after a percutaneous coronary intervention, with or without coronary stenting. RESULTS: 16 studies, involving 4631 patients undergoing a percutaneous coronary intervention, yielded 1683 patients with restenosis after a mean weighted follow up of 5.5 months. The combined odds ratio for restenosis in people with the DD genotype was 1.23 (99% confidence interval 1.03 to 1.46). When studies were grouped by size, however, the combined odds ratios for restenosis in people with the DD genotype were 1.94 (1.39 to 2.71) for studies with less than 100 cases, 1.33 (0.92 to 1.93) for studies with 100-200 cases, and 0.92 (0.72 to 1.18) for studies with more than 200 cases (trend P=0.02). Similarly, when studies were grouped by genotyping procedures, significantly larger odds ratios were found in the studies that did not conceal disease status from laboratory staff and in the studies that did not use a second polymerase chain reaction amplification to reduce genetic mistyping. CONCLUSION: Compared with other studies, larger and more rigorous studies show a weaker association between the angiotensin converting enzyme gene DD genotype and restenosis. Publication bias or detection biases can produce artefactual associations at least as large as those that might be expected for common polymorphisms in complex diseases, suggesting the need for larger and more rigorous genetic epidemiological investigations than are now customary.
A meta-analysis of studies of dietary fat and breast cancer risk. Boyd NF, Martin LJ, Noffel M, Lockwood GA, Trichler DL. British Journal of Cancer 1993: 68(3); 627-36. [Medline] There is strong evidence that breast cancer risk is influenced by environmental factors, and animal experiments and human ecological data suggest that increased dietary fat intake increases the incidence of the disease. Epidemiological evidence on the relationship of dietary fat to breast cancer from cohort and case control studies has however been inconsistent. To examine the available evidence we have carried out a meta-analysis to summarise quantitatively the large published literature on dietary fat in the aetiology of breast cancer. After assembling all of the published case control and cohort studies, we extracted the relative risk in each study that compared the highest to the lowest level of intake. We then calculated a summary relative risk for all studies. The summary relative risk for the 23 studies that examined fat as a nutrient was 1.12 (95% CI 1.04-1.21). Cohort studies had a summary relative risk of 1.01 (95% CI 0.90-1.13) and case control studies a relative risk of 1.21 (95% CI 1.10-1.34). Summary estimates of risk for specific types of fat excluded unity for only saturated fat. For the 19 studies that examined food intake, the summary relative risks were 1.18 (95% CI 1.06-1.32) for meat, 1.17 (95% CI 1.04-1.31) for milk, and 1.17 (95% CI 1.02-1.36) for cheese. Summary relative risks for total fat intake were examined for several potential modifying factors. Regression analysis showed that European studies were more likely than studies done in other countries to show an increased relative risk associated with dietary fat and breast cancer, after taking into account potential modifying factors that included study design and quality.
A meta-analysis of adolescent smoking prevention programs. Bruvold WH. American Journal of Public Health 1993: 83(6); 872-80. [Medline] OBJECTIVES. A large number of studies evaluating adolescent smoking prevention programs have been published. Systematic quantitative reviews of this literature are needed to learn what does and does not work. The present meta-analysis focuses on the efficacy of school-based programs. METHODS. Evaluations of 94 separate interventions were included in the meta-analysis. Studies were screened for methodological rigor and those with weaker methodology were segregated from those with more defensible methodology; major analyses focused on the latter. RESULTS. Behavioral effect sizes were found to be largest for interventions with a social reinforcement orientation, moderate for interventions with either a developmental or a social norms orientation, and small for interventions with the traditional rational orientation. Attitude effect sizes followed the same pattern, but knowledge effect sizes were similar across all four orientation categories. CONCLUSIONS. Because behavioral effect represents the fundamental objective of programs for prevention of adolescent tobacco use, the present results indicate that school-based programs should consider adopting interventions with a social reinforcement, social norms, or developmental orientation.
Oral corticosteroid therapy for patients with stable chronic obstructive pulmonary disease A meta-Analysis. Callahan C. Annals of Internal Medicine 1991: 114(3); 216-46. Abstract not available yet.
Epidemiologic association between dietary calcium intake and blood pressure: a meta-analysis of published data. Cappuccio FP, Elliott P, Allender PS, Pryer J, Follman DA, Cutler JA. Am J Epidemiol 1995: 142(9); 935-45. The objectives of the study were to assess whether the epidemiologic data support a relation between dietary calcium intake and blood pressure, to obtain a quantitative estimate of the difference in blood pressure for a given difference in dietary calcium intake, and to assess the public health implications. A meta-analysis of published data (January 1983 to November 1993) that investigated the association between dietary calcium intake and blood pressure in different populations around the world was performed. Of 63 population studies identified, 23 were suitable for a quantitative overview (total n = 38,950). Unadjusted regression coefficients (95% confidence intervals) were obtained. Pooled unadjusted regression coefficients (95% confidence intervals) were then computed weighting each individual study by the inverse of its variance. Tests of heterogeneity and sensitivity analysis were carried out, and the possibility of publication bias was assessed. The regression coefficients ranged between -9.40 and 1.63 mmHg/100 mg calcium for systolic blood pressure and between -4.90 and 0.47 for diastolic blood pressure. In men (11 studies, n = 7,271), the pooled regression coefficients were -0.010 and -0.009 mmHg/100 mg calcium for systolic and diastolic pressures, respectively (p < 0.001 and p < 0.05). In women (six studies, n = 8,507), they were -0.15 and -0.057 mmHg/100 mg calcium (p < 0.001 and p < 0.02), and in men and women combined (six studies, n = 23,172 for systolic pressure and four studies, n = 3,215 for diastolic pressure) they were -0.061 and -0.061 mmHg/100 mg calcium (p < 0.001 and p < 0.05). In those studies that used the 24-hour recall method, the pooled regression coefficients were -0.06 and -0.09 mmHg/100 mg calcium (p < 0.005 and p = 0.07), whereas in those that used the food frequency questionnaire, they were -0.15 and -0.05 mmhg/100 mg calcium (p < 0.001 and p < 0.03). These data are consistent with an inverse association between dietary calcium intake and blood pressure. However, the size of the estimate, the observed heterogeneity among studies, and the possibility of confounding and publication bias indicate that an increase in calcium intake above the Recommended Dietary Allowance is not recommended at population level for the prevention and treatment of high blood pressure.
Evidence for decreasing quality of semen during past 50 years. Carlsen E, Giwercman A, Keiding N, Skakkebaek NE. British Medical Journal 1992: 305(6854); 609-13. [Medline] OBJECTIVE--To investigate whether semen quality has changed during the past 50 years. DESIGN--Review of publications on semen quality in men without a history of infertility selected by means of Cumulated Index Medicus and Current List (1930-1965) and MEDLINE Silver Platter database (1966-August 1991). SUBJECTS--14,947 men included in a total of 61 papers published between 1938 and 1991. MAIN OUTCOME MEASURES--Mean sperm density and mean seminal volume. RESULTS--Linear regression of data weighted by number of men in each study showed a significant decrease in mean sperm count from 113 x 10(6)/ml in 1940 to 66 x 10(6)/ml in 1990 (p < 0.0001) and in seminal volume from 3.40 ml to 2.75 ml (p = 0.027), indicating an even more pronounced decrease in sperm production than expressed by the decline in sperm density. CONCLUSIONS--There has been a genuine decline in semen quality over the past 50 years. As male fertility is to some extent correlated with sperm count the results may reflect an overall reduction in male fertility. The biological significance of these changes is emphasised by a concomitant increase in the incidence of genitourinary abnormalities such as testicular cancer and possibly also cryptorchidism and hypospadias, suggesting a growing impact of factors with serious effects on male gonadal function.
Temperature measured at the axilla compared with rectum in children and young people: systematic review. Craig J, Lancaster G, Williamson P, Smyth R. British Medical Journal 2000: 320(7243); 1174-1178. ABSTRACT: OBJECTIVE: To evaluate the agreement between temperature measured at the axilla and rectum in children and young people. DESIGN: A systematic review of studies comparing temperature measured at the axilla (test site) with temperature measured at the rectum (reference site) using the same type of measuring device at both sites in each patient. Devices were mercury or electronic thermometers or indwelling thermocouple probes. STUDIES REVIEWED: 40 studies including 5528 children and young people from birth to 18 years. DATA EXTRACTION: Difference in temperature readings at the axilla and rectum. RESULTS: 20 studies (n=3201 (58%) participants) had sufficient data to be included in a meta-analysis. There was significant residual heterogeneity in both mean differences and sample standard deviations within the groups using different devices and within age groups. The pooled (random effects) mean temperature difference (rectal minus axillary temperature) for mercury thermometers was 0.25 degrees C (95% limits of agreement -0.15 degrees C to 0.65 degrees C) and for electronic thermometers was 0. 85 degrees C (-0.19 degrees C to 1.90 degrees C). The pooled (random effects) mean temperature difference (rectal minus axillary temperature) for neonates was 0.17 degrees C (-0.15 degrees C to 0. 50 degrees C) and for older children and young people was 0.92 degrees C (-0.15 degrees C to 1.98 degrees C). CONCLUSIONS: The difference between temperature readings at the axilla and rectum using either mercury or electronic thermometers showed wide variation across studies. This has implications for clinical situations where temperature needs to be measured with precision.
Efficacy, tolerability, and upper gastrointestinal safety of celecoxib for treatment of osteoarthritis and rheumatoid arthritis: systematic review of randomised controlled trials. Deeks JJ, Smith LA, Bradley MD. BMJ 2002: 325(7365); 619-. [Abstract] [Full text] [PDF] Objective: To determine the efficacy, gastrointestinal safety, and tolerability of celecoxib (a cyclo-oxygenase 2 (COX 2) inhibitor) used in the treatment of osteoarthritis and rheumatoid arthritis. Design: Systematic review of randomised trials that compared at least 12 weeks' celecoxib treatment with another non-steroidal anti-inflammatory drug (NSAID) or placebo and reported efficacy, tolerability, or safety. Trials identified from manufacturer and by searching electronic databases and evaluated according to predefined inclusion and quality criteria. Data combined through meta-analysis. Participants: 15 187 patients with osteoarthritis or rheumatoid arthritis. Main outcome measures: Efficacy: Western Ontario and McMaster universities osteoarthritis index; American College of Rheumatology responder index and joint scores for rheumatoid arthritis. Tolerability: withdrawal rates for adverse effects. Gastrointestinal safety: incidence of ulcers, bleeds, perforations, and obstructions. Results: Nine randomised controlled trials were included. Celecoxib and NSAIDS were equally effective for all efficacy outcomes. Compared with those taking other NSAIDs, in patients taking celecoxib the rate of withdrawals due to adverse gastrointestinal events was 46% lower (95% confidence interval 29% to 58%; NNT 35 at three months), the incidence of ulcers detectable by endoscopy was 71% lower (59% to 79%; NNT 6 at three months), and the incidence of symptoms of ulcers, perforations, bleeds, and obstructions was 39% lower (4% to 61%; NNT 208 at six months). Subgroup analysis of patients taking aspirin showed that the incidence of ulcers detected by endoscopy was reduced by 51% (14% to 72%) in those given celecoxib compared with other NSAIDs. The reduction was greater (73%, 52% to 84%) in those not taking aspirin. Conclusion: Celecoxib is as effective as other NSAIDs for relief of symptoms of osteoarthritis and rheumatoid arthritis and has significantly improved gastrointestinal safety and tolerability. What is already known on this topic Long term NSAID use is associated with the development of peptic and duodenal ulcersCOX 2 specific inhibitors are claimed to cause fewer gastrointestinal complicationsThe National Institute for Clinical Excellence has recently recommended that COX 2 specific inhibitors are used in patients with arthritis who are at risk of gastrointestinal complications but not in those taking prophylactic aspirinWhat this study adds Systematic review of randomised trials shows that celecoxib is as effective as other NSAIDs for osteoarthritis and rheumatoid arthritisCelecoxib has significantly improved gastrointestinal safety and tolerability compared with standard NSAIDsAn improvement in gastrointestinal safety was still evident in patients who were also taking aspirin
Systematic review of near patient test evaluations in primary care. Delaney B, Hyde C, McManus R, Wilson S, Fitzmaurice D, Jowett S, Tobias R, Thorpe G, Richard Hobbs F. British Medical Journal 1999: 319(7213); 824-827. ABSTRACT: OBJECTIVE: To identify and qualitatively synthesise the findings from all studies that have examined the performance and effect of near patient tests in the primary care setting. DESIGN: Systematic review of published and unpublished research 1986-99. Main outcome measures: Test performance characteristics, measures of effect on clinical practice or patient outcome. RESULTS: 101 relevant publications were identified. The general quality of these papers was low, and consequently only 32 papers were assessed in detail. Although these papers gave some indication of the value of near patient testing in areas such as anticoagulation monitoring and group A beta haemolytic streptococcus testing, the research raised many more questions than it answered. Almost no reports were found of unbiased assessment of the effect of near patient tests in primary care on patient outcomes, organisational outcomes, or cost. CONCLUSIONS: Available research provides little evidence to guide the expansion of use of near patient testing in primary care. Further research is needed in areas of clinical practice where near patient tests might be most beneficial.
Sex of researchers and sex-typed communications as determinants of sex differences in influencability: a meta-analysis of social influence studies. Eagly A, Carli L. Psychol Bull 1981: 90; 1-20. Abstract not available.
Surgery or embolisation for varicocele in subfertile men (Cochrane Review) [In Process Citation]. Evers JL, Collins JA, Vandekerckhove P. Cochrane Database Syst Rev. 2001: 1(CD000479); CD000479. [Medline] BACKGROUND: A varicocele is an, almost exclusively left-sided, varicosity of the pampiniform plexus of the spermatic cord, forming a tangle of distended blood vessels in the scrotum. Although the concept that varicocele causes male subfertility and therefore varicocelectomy cures male subfertility has been around for almost fifty years now, the mechanisms by which varicocele would affect fertility have not yet been satisfactorily explained, and neither have the mechanisms by which varicocelectomy would resolve subfertility. Furthermore, it has been questioned whether a causal relation exists at all between the distension of the pampiniform plexus and impairment of fertility. OBJECTIVES: To evaluate the effect of varicocele treatment on pregnancy rate in subfertile couples. SEARCH STRATEGY: Relevant trials were identified in the Cochrane Menstrual Disorders and Subfertility Group's specialised register of controlled trials. A MEDLINE search, using the group's search strategy, was performed for the period 1966-2000. Also, hand searching was performed of 22 specialist journals in the field from their first issue till 2000. Cross references and references from review articles were checked. SELECTION CRITERIA: RCTs were included if they were relevant to the clinical question posed, if they reported pregnancy rates as an outcome measure, and if they reported data in treated (surgical ligation or radiological embolization of the internal spermatic vein) and untreated groups. DATA COLLECTION AND ANALYSIS: Six studies met the inclusion criteria for this review. One (~~ Nieschlag 1995/1998~~) was an extension of a previously published study (~~ Nieschlag 1995/1998~~), which left five studies for analysis (~~ Nilsson 1979~~; ~~ Breznik 1993~~; ~~ Madgar 1995~~; ~~ Yamamoto 1996~~; ~~ Nieschlag 1995/1998~~). The results of a WHO megatrial are awaited but as yet are unavailable. The WHO data will be added if and when they will have become available. All five only included men from couples with subfertility problems, one (~~ Madgar 1995~~) excluded men with sperm counts <5 mill/mL, three (~~ Nilsson 1979~~; ~~ Breznik 1993~~; ~~ Yamamoto 1996~~) also included men with normal semen analysis. One study (~~ Yamamoto 1996~~) specifically addressed only men with subclinical varicoceles as diagnosed by thermography. Potentially relevant trials were screened independently by two authors (JE and JC). Any differences of opinion were resolved by consensus meeting (none occurred for this review). Studies were excluded from meta-analysis if they made comparisons other than those specified above. MAIN RESULTS: One trial (~~ Madgar 1995~~) reported a statistically significant improvement in pregnancy rate following high ligation of the left spermatic vein. None of the other four studies showed individually a significant effect on pregnancy rates of varicocele treatment over no-treatment (~~ Nilsson 1979~~; ~~ Breznik 1993~~; ~~ Yamamoto 1996~~), or over counseling only (~~ Nieschlag 1995/1998~~). The combined RR (Relative Risk; random effects method) of the five studies is 1.06 (95%CI 0.57-1.94), the Peto OR (Odds Ratio) is 1.15 (95%CI 0.73-1.83). REVIEWER'S CONCLUSIONS: Insufficient evidence exists that treatment of varicocele in men from couples with otherwise unexplained subfertility does improve the couple's spontaneous pregnancy chances.
Youth Access Interventions Do Not Affect Youth Smoking. Fichtenberg CMMaSAG, PhD. Pediatrics Journal 2002 (June): 109(6); 1088-1092. From the Center for Tobacco Control Research and Education, Institute for Health Policy Studies, Cardiovascular Research Institute, University of California, San Francisco, San Francisco, California Objective. To determine the effectiveness of laws restricting youth access to cigarettes on prevalence of smoking among teens. Methods. We conducted a systematic review of studies that reported changes in smoking associated with the presence of restrictions on the ability of teens to purchase cigarettes. We calculated the correlation between merchant compliance levels with youth access laws and prevalence (30-day and regular) prevalence of youth smoking, and between changes in compliance and prevalence associated with youth access interventions. We also conducted a random effects meta-analysis to determine the change in youth prevalence associated with youth access interventions from studies that included control communities. Results. Based on data from 9 studies, there was no detectable relationship between the level of merchant compliance and 30-day (r =.116; n = 38 communities) or regular (r =.017) smoking prevalence. There was no evidence of a threshold effect. There was no evidence that an increase in compliance with youth access restrictions was associated with a decrease in 30-day (r =.294; n = 18 communities) or regular (r =.274) smoking prevalence. There was no significant difference in youth smoking in communities with youth access interventions compared with control communities; the pooled estimate of the effect of intervention on 30-day prevalence was -1.5% (95% confidence interval: -6.0% to +2.9%). Conclusions. Given the limited resources available for tobacco control, as well as the expense of conducting youth access programs, tobacco control advocates should abandon this strategy and devote the limited resources that are available for tobacco control toward other interventions with proven effectiveness.
Practice Parameter: The Management of Acute Gastroenteritis in Young Children. Gastroenteritis PCoQISoA. Pediatrics 1996: 97(3); 424-431. ABSTRACT: This practice parameter formulates recommendations for health care providers about the management of acute diarrhea in children ages 1 month to 5 years. It was developed through a comprehensive search and analysis of the medical literature. Expert consensus opinion was used to enhance or formulate recommentations where data were insufficient. The Provisional Committee on Quality Improvement of the American Academy of Pediatrics (AAP) selected a subcommittee composed of pediatricians with expertise in the fields of gastroenterology, infectious diseases, pediatric practice, and epidemiology to develop the parameter. The subcommittee, the Provisional Committee on Quality Improvement, a review panel of practitioners, and other groups of experts within and outside the AAP reviewed and revised the parameter. Three specific management issues were considered: (1) methods of rehydration, (2) refeeding after rehydration, and (3) the use of antidiarrheal agents. Main outcomes considered were success or failure of rehydration, resolution of diarrhea, and adverse effects from various treatment options. A comprehensive bibliography of literature on gastroenteritis and diarrhea was compiled and reduced to articles amenable to analysis. Oral rehydration therapy was studied in depth; inconsistency in the outcomes measured in the studies interfered with meta-analysis but allowed for formulation of strong conclusions. Oral rehydration was found to be as effective as intravenous therapy in rehydrating children with mild to moderate dehydration and is the therapy of first choice in these patients. Refeeding was supported by enough comparable studies to permit a valid meta-analysis. Early refeeding with milk or food after rehydration does not prolong diarrhea; there is evidence that it may reduce the duration of diarrhea by approximately half a day and is recommended to restore nutritional balance as soon as possible. Data on antidiarrheal agents were not sufficient to demonstrate efficacy; therefore, the routine use of antidiarrheal agents is not recommended, because many of these agents have potentially serious adverse effects in infants and young children. This pracrtice parameter is not indended as a sole source of guidance in the treatment of acute gastroenteritis in children. It is designed to assist pediatricians by providing an analytic framework for the evaluation and treatment of this condition. It is not intended to replace clinical judgment or to establish a protocol for all patients with this condition. It rarely will provide the only appropriate approach to the problem. A technical report describing the analyses used to prepare this parameter and a patient education brochure are available through the Publications Department of the AAP.
Effect of iron supplementation on incidence of infectious illness in children: systematic review. Gera T, Sachdev HPS. BMJ 2002: 325(7373); 1142-. [Abstract] [Full text] [PDF] Objective: To evaluate the effect of iron supplementation on the incidence of infections in children. Design: Systematic review of randomised controlled trials. Data sources: 28 randomised controlled trials (six unpublished and 22 published) on 7892 children. Interventions: Oral or parenteral iron supplementation or fortified formula milk or cereals. Outcomes: Incidence of all recorded infectious illnesses, and individual illnesses, including respiratory tract infection, diarrhoea, malaria, other infections, and prevalence of positive smear results for malaria. Results: The pooled estimate (random effects model) of the incidence rate ratio (iron v placebo) was 1.02 (95% confidence interval 0.96 to 1.08, P=0.54; P<0.0001 for heterogeneity). The incidence rate difference (iron minus placebo) for all recorded illnesses was 0.06 episodes/child year ([-]0.06 to 0.18, P=0.34; P<0.0001 for heterogeneity). However, there was an increase in the risk of developing diarrhoea (incidence rate ratio 1.11, 1.01 to 1.23, P=0.04), but this would not have an overall important on public health (incidence rate difference 0.05 episodes/child year, -0.03 to 0.13; P=0.21). The occurrence of other illnesses and positive results on malaria smears (adjusted for positive smears at baseline) were not significantly affected by iron administration. On meta-regression, the statistical heterogeneity could not be explained by the variables studied. Conclusion: Iron supplementation has no apparent harmful effect on the overall incidence of infectious illnesses in children, though it slightly increases the risk of developing diarrhoea. What is already known on this topic Iron supplementation is recommended to prevent iron deficiency, which is a major health problem, especially in the developing countriesConflicting data exist regarding the possibility of an increase in the incidence of infections with iron supplementation, resulting in concern about the safety of this interventionWhat this study adds Iron supplementation has no apparent harmful effect on the overall incidence of infectious illnesses in childrenIron administration increases the risk of developing diarrhoeaFortification of foods may be the safest and most beneficial mode of supplementation in relation to infectious illnesses
Comparison of intrauterine and intracervical insemination with frozen donor sperm: a meta-analysis. Goldberg JM, Mascha E, Falcone T, Attaran M. Fertil Steril 1999: 72(5); 792-5. [Medline] OBJECTIVE: To determine whether artificial insemination with frozen donor sperm yielded a higher pregnancy rate per cycle by intracervical (ICI) or intrauterine (IUI) techniques. A meta-analysis was performed. DATA IDENTIFICATION: A computerized MEDLINE search of the English-language literature on artificial insemination with donor sperm was performed and augmented by a review of meeting abstract books and references in published papers. STUDY SELECTION: Only prospective randomized studies that reported monthly fecundity rates for IUI and ICI with frozen donor sperm were included. DATA ANALYSIS: Seven studies were identified. The odds ratios (OR) and 95% confidence intervals (CI) were determined with use of the general estimating equation method for the three studies for which raw data could be obtained. For the remaining four studies, the OR and CI were assessed with use of the published summary data. A random-effects meta-analysis was then performed. RESULT: Intrauterine insemination resulted in a significantly higher monthly fecundity rate with a common OR of 2.4 (CI 1.5-3.8). CONCLUSION: On the basis of this meta-analysis of the seven prospective studies, IUI results in higher pregnancy rates than ICI for frozen donor insemination.
Diabetes care in general practice: meta-analysis of randomised control trials. Griffin S. British Medical Journal 1998: 317(7155); 390-6. OBJECTIVE: To assess the effectiveness of care in general practice for people with diabetes. DESIGN: Meta-analysis of randomised trials comparing general practice and shared care with follow up in hospital outpatient clinic. IDENTIFICATION: Trials were identified from searches of eight bibliographic and research databases. RESULTS: Five trials identified included 1058 people with diabetes, overall mean age 58.4 years, receiving hospital outpatient follow up for their diabetes. Results were heterogeneous between trials. In shared care schemes featuring more intensive support through a computerised prompting system for general practitioners and patients, there was no difference in mortality between care in hospital and care in general practice (odds ratio 1.06, 95% confidence interval 0. 53 to 2.11); glycated haemoglobin tended to be lower in primary care (weighted difference in means of -0.28%, -0.59% to 0.03%); and losses to follow up were significantly lower in primary care (odds ratio 0.37, 0.22 to 0.61). However, schemes with less well developed support for family doctors were associated with adverse outcomes for patients. CONCLUSIONS: Unstructured care in the community is associated with poorer follow up, worse glycaemic control, and greater mortality than in hospital care. Computerised central recall, with prompting for patients and their family doctors, can achieve standards of care as good as or better than hospital outpatient care, at least in the short term. The evidence supports provision of regular prompted recall and review of selected people with diabetes by willing general practitioners. This can be achieved if suitable organisation is in place.
Antihypertensive drugs in very old people: a subgroup meta-analysis of randomised controlled trials. INDANA Group. Gueyffier F, Bulpitt C, Boissel JP, Schron E, Ekbom T, Fagard R, Casiglia E, Kerlikowske K, Coope J. Lancet 1999: 353(9155); 793-6. BACKGROUND: Beneficial clinical effects of treatment with antihypertensive drugs have been shown in middle-aged patients and in those hypertensive patients over 60 years old, but whether treatment is beneficial in patients over 80 years old is not known. METHODS: We collected data from all participants aged 80 years and over in randomised controlled trials of antihypertensive drugs through direct contact with study investigators. Our primary outcome was fatal and non-fatal stroke. Secondary outcomes were death from all causes, cardiovascular death, fatal and non-fatal major coronary and cardiovascular events, and heart failure. FINDINGS: There were 57 strokes and 34 deaths among 874 actively treated patients, compared with 77 strokes and 28 stroke deaths among 796 controls, representing 1 non-fatal stroke prevented for about 100 patients treated each year. The meta-analysis of data from 1670 participants aged 80 years or older suggested that treatment prevented 34% (95% CI 8-52) of strokes. Rates of major cardiovascular events and heart failure were significantly decreased, by 22% and 39%, respectively. However, there was no treatment benefit for cardiovascular death, and a non-significant 6% (-5 to 18) relative excess of death from all causes. INTERPRETATIONS: The inconclusive findings for mortality contrast with the benefit of treatment for non-fatal events. Results of a large-scale specific trial are needed for definite conclusion that antihypertensive treatment is beneficial in very elderly hypertensive patients. Meanwhile, an age threshold beyond which hypertension should not be treated cannot be justified.
Systematic review of efficacy of cognitive behaviour therapies in childhood and adolescent depressive disorder. Harrington R, Whittaker J, Shoebridge P, Campbell F. British Medical Journal 1998: 316(7144); 1559-63. OBJECTIVE: To determine whether cognitive behaviour therapy is an effective treatment for childhood and adolescent depressive disorder. DESIGN: Systematic review of six randomised trials comparing the efficacy of cognitive behaviour therapy with inactive interventions in subjects aged 8 to 19 years with depressive disorder. MAIN OUTCOME MEASURE: Remission from depressive disorder. RESULTS: The rate of remission from depressive disorder was higher in the therapy group (129/208; 62%) than in the comparison group (61/168; 36%). The pooled odds ratio was 3.2 (95% confidence interval 1.9 to 5.2), suggesting a significant benefit of active treatment. Most studies, however, were based on relatively mild cases of depression and were of only moderate quality. CONCLUSIONS: Cognitive behaviour therapy may be of benefit for depressive disorder of moderate severity in children and adolescents. It cannot, however, yet be recommended for severe depression. Definitive large trials will be required to determine whether the results of this systematic review are reliable.
Systematic review of long term effects of advice to reduce dietary salt in adults. Hooper L, Bartlett C, Davey Smith G, Ebrahim S. BMJ 2002: 325(7365); 628-. [Abstract] [Full text] [PDF] Objective: To assess the long term effects of advice to restrict dietary sodium in adults with and without hypertension. Design: Systematic review and meta-analysis of randomised controlled trials. Data sources: Cochrane library, Medline, Embase, and bibliographies. Study selection: Unconfounded randomised trials that aimed to reduce sodium intake in healthy adults over at least 6 months. Inclusion decisions, validity and data extraction were duplicated. Random effects meta-analysis, subgrouping, sensitivity analysis, and meta-regression were performed. Outcomes: Mortality, cardiovascular events, blood pressure, urinary sodium excretion, quality of life, and use of antihypertensive drugs. Results: Three trials in normotensive people (n=2326), five trials in those with untreated hypertension (n=387), and three trials in people being treated for hypertension (n=801) were included, with follow up from six months to seven years. The large high quality (and therefore most informative) studies used intensive behavioural interventions. Deaths and cardiovascular events were inconsistently defined and reported. There were 17 deaths, equally distributed between intervention and control groups. Systolic and diastolic blood pressures were reduced (systolic by 1.1 mm Hg, 95% confidence interval 1.8 to 0.4 mm Hg; diastolic by 0.6 mm Hg, 1.5 to [-]0.3 mm Hg) at 13 to 60 months, as was urinary 24 hour sodium excretion (by 35.5 mmol/24 hours, 47.2 to 23.9). Degree of reduction in sodium intake and change in blood pressure were not related. Conclusions: Intensive interventions, unsuited to primary care or population prevention programmes, provide only small reductions in blood pressure and sodium excretion, and effects on deaths and cardiovascular events are unclear. Advice to reduce sodium intake may help people on antihypertensive drugs to stop their medication while maintaining good blood pressure control. What is already known on this topic Restricting sodium intake in people with hypertension reduces blood pressureLong term effects (on blood pressure, mortality, and morbidity) of reduced salt intake in people with and without hypertension are unclearWhat this study adds Few deaths and cardiovascular events have been reported in salt reduction trialsMeta-analysis shows that blood pressure was reduced (systolic by 1.1 mm Hg, diastolic by 0.6 mm Hg) at 13 to 60 months, with a reduction in sodium excretion of almost a quarter (35.5 mmol/24 hours)The interventions used were highly intensive and unsuited to primary care or population prevention programmesLower salt intake may help people on antihypertensive drugs to stop their medication while maintaining good control of blood pressure, but there are doubts about effects of sodium reduction on overall health
Dietary fat intake and prevention of cardiovascular disease: systematic review. Hooper L, Summerbell CD, Higgins JPT, Thompson RL, Capps NE, Smith GD, Riemersma RA, Ebrahim S. British Medical Journal 2001: 322(7289); 757-763. [Abstract] [Full text] [PDF] Objective: To assess the effect of reduction or modification of dietary fat intake on total and cardiovascular mortality and cardiovascular morbidity. Design: Systematic review. Data sources: Cochrane Library, Medline, Embase, CAB abstracts, SIGLE, CVRCT registry, and biographies were searched; trials known to experts were included. Included studies: Randomised controlled trials stating intention to reduce or modify fat or cholesterol intake in healthy adult participants over at least six months. Inclusion decisions, validity, and data extraction were duplicated. Meta-analysis (random effects methodology), meta-regression, and funnel plots were performed. Results: 27 studies (30 902 person years of observation) were included. Alteration of dietary fat intake had small effects on total mortality (rate ratio 0.98; 95% confidence interval 0.86 to 1.12). Cardiovascular mortality was reduced by 9% (0.91; 0.77 to 1.07) and cardiovascular events by 16% (0.84; 0.72 to 0.99), which was attenuated (0.86; 0.72 to 1.03) in a sensitivity analysis that excluded a trial using oily fish. Trials with at least two years' follow up provided stronger evidence of protection from cardiovascular events (0.76; 0.65 to 0.90). Conclusions: There is a small but potentially important reduction in cardiovascular risk with reduction or modification of dietary fat intake, seen particularly in trials of longer duration.
Clinical trials of homoeopathy [published erratum appears in BMJ 1991 Apr 6; 302(6780):818]. Kleijnen J, Knipschild P, ter Riet G. British Medical Journal 1991: 302(6772); 316-23. OBJECTIVE--To establish whether there is evidence of the efficacy of homoeopathy from controlled trials in humans. DESIGN--Criteria based meta-analysis. Assessment of the methodological quality of 107 controlled trials in 96 published reports found after an extensive search. Trials were scored using a list of predefined criteria of good methodology, and the outcome of the trials was interpreted in relation to their quality. SETTING--Controlled trials published world wide. MAIN OUTCOME MEASURES--Results of the trials with the best methodological quality. Trials of classical homoeopathy and several modern varieties were considered separately. RESULTS--In 14 trials some form of classical homoeopathy was tested and in 58 trials the same single homoeopathic treatment was given to patients with comparable conventional diagnosis. Combinations of several homoeopathic treatments were tested in 26 trials; isopathy was tested in nine trials. Most trials seemed to be of very low quality, but there were many exceptions. The results showed a positive trend regardless of the quality of the trial or the variety of homeopathy used. Overall, of the 105 trials with interpretable results, 81 trials indicated positive results whereas in 24 trials no positive effects of homoeopathy were found. The results of the review may be complicated by publication bias, especially in such a controversial subject as homoeopathy. CONCLUSIONS--At the moment the evidence of clinical trials is positive but not sufficient to draw definitive conclusions because most trials are of low methodological quality and because of the unknown role of publication bias. This indicates that there is a legitimate case for further evaluation of homoeopathy, but only by means of well performed trials.
Environmental tobacco smoke exposure and ischaemic heart disease: an evaluation of the evidence. Law MR, Morris JK, Wald NJ. British Medical Journal 1997: 315(7114); 973-80. OBJECTIVES: To estimate the risk of ischaemic heart disease caused by exposure to environmental tobacco smoke and to explain why the associated excess risk is almost half that of smoking 20 cigarettes per day when the exposure is only about 1% that of smoking. DESIGN: Meta-analysis of all 19 acceptable published studies of risk of ischaemic heart disease in lifelong non-smokers who live with a smoker and in those who live with a non-smoker, five large prospective studies of smoking and ischaemic heart disease, and studies of platelet aggregation and studies of diet according to exposure to tobacco smoke. RESULTS: The relative risk of ischaemic heart disease associated with exposure to environmental tobacco smoke was 1.30 (95% confidence interval 1.22 to 1.38) at age 65. At the same age the estimated relative risk associated with smoking one cigarette per day was similar (1.39 (1.18 to 1.64)), while for 20 per day it was 1.78 (1.31 to 2.44). Two separate analyses indicated that non-smokers who live with smokers eat a diet that places them at a 6% higher risk of ischaemic heart disease, so the direct effect of environmental tobacco smoke is to increase risk by 23% (14% to 33%), since 1.30/1.06 = 1.23. Platelet aggregation provides a plausible and quantitatively consistent mechanism for the low dose effect. The increase in platelet aggregation produced experimentally by exposure to environmental tobacco smoke would be expected to have acute effects increasing the risk of ischaemic heart disease by 34%. CONCLUSION: Breathing other people's smoke is an important and avoidable cause of ischaemic heart disease, increasing a person's risk by a quarter.
Incidence of adverse drug reactions in hospitalized patients: a meta-analysis of prospective studies. Lazarou J, Pomeranz BH, Corey PN. Jama 1998: 279(15); 1200-5. OBJECTIVE: To estimate the incidence of serious and fatal adverse drug reactions (ADR) in hospital patients. DATA SOURCES: Four electronic databases were searched from 1966 to 1996. STUDY SELECTION: Of 153, we selected 39 prospective studies from US hospitals. DATA EXTRACTION: Data extracted independently by 2 investigators were analyzed by a random-effects model. To obtain the overall incidence of ADRs in hospitalized patients, we combined the incidence of ADRs occurring while in the hospital plus the incidence of ADRs causing admission to hospital. We excluded errors in drug administration, noncompliance, overdose, drug abuse, therapeutic failures, and possible ADRs. Serious ADRs were defined as those that required hospitalization, were permanently disabling, or resulted in death. DATA SYNTHESIS: The overall incidence of serious ADRs was 6.7% (95% confidence interval [CI], 5.2%-8.2%) and of fatal ADRs was 0.32% (95% CI, 0.23%-0.41%) of hospitalized patients. We estimated that in 1994 overall 2216000 (1721000-2711000) hospitalized patients had serious ADRs and 106000 (76000-137000) had fatal ADRs, making these reactions between the fourth and sixth leading cause of death. CONCLUSIONS: The incidence of serious and fatal ADRs in US hospitals was found to be extremely high. While our results must be viewed with circumspection because of heterogeneity among studies and small biases in the samples, these data nevertheless suggest that ADRs represent an important clinical issue.
Are the clinical effects of homeopathy placebo effects? A meta-analysis of placebo-controlled trials. Linde K, Clausius N, Ramirez G, Melchart D, Eitel F, Hedges LV, Jonas WB. Lancet 1997: 350(9081); 834-43. BACKGROUND: Homeopathy seems scientifically implausible, but has widespread use. We aimed to assess whether the clinical effect reported in randomised controlled trials of homeopathic remedies is equivalent to that reported for placebo. METHODS: We sought studies from computerised bibliographies and contracts with researchers, institutions, manufacturers, individual collectors, homeopathic conference proceedings, and books. We included all languages. Double-blind and/or randomised placebo-controlled trials of clinical conditions were considered. Our review of 185 trials identified 119 that met the inclusion criteria. 89 had adequate data for meta-analysis, and two sets of trial were used to assess reproducibility. Two reviewers assessed study quality with two scales and extracted data for information on clinical condition, homeopathy type, dilution, "remedy", population, and outcomes. FINDINGS: The combined odds ratio for the 89 studies entered into the main meta-analysis was 2.45 (95% CI 2.05, 2.93) in favour of homeopathy. The odds ratio for the 26 good-quality studies was 1.66 (1.33, 2.08), and that corrected for publication bias was 1.78 (1.03, 3.10). Four studies on the effects of a single remedy on seasonal allergies had a pooled odds ratio for ocular symptoms at 4 weeks of 2.03 (1.51, 2.74). Five studies on postoperative ileus had a pooled mean effect-size-difference of -0.22 standard deviations (95% CI -0.36, -0.09) for flatus, and -0.18 SDs (-0.33, -0.03) for stool (both p < 0.05). INTERPRETATION: The results of our meta-analysis are not compatible with the hypothesis that the clinical effects of homeopathy are completely due to placebo. However, we found insufficient evidence from these studies that homeopathy is clearly efficacious for any single clinical condition. Further research on homeopathy is warranted provided it is rigorous and systematic.
Systematic Review of Transplyloric Versus Gastric Tube Feeding for Preterm Infants. McGuire W, McEwan P. Arch Dis Child Neonatal Ed 2004: 89; 245-248.
Is evidence for homoeopathy reproducible? Reilly D, Taylor MA, Beattie NG, Campbell JH, McSharry C, Aitchison TC, Carter R, Stevenson RD. Lancet 1994: 344(8937); 1601-6. ABSTRACT: We tested, under independent conditions, the reproducibility of evidence from two previous trials that homoeopathy differs from placebo. The test model was again homoeopathic immunotherapy. 28 patients with allergic asthma, most of them sensitive to house-dust mite, were randomly allocated to receive either oral homoeopathic immunotherapy to their principal allergen or identical placebo. The test treatments were given as a complement to their unaltered conventional care. A daily visual analogue scale of overall symptom intensity was the outcome measure. A difference in visual analogue score in favour of homoeopathic immunotherapy appeared within one week of starting treatment and persisted for up to 8 weeks (p = 0.003). There were similar trends in respiratory function and bronchial reactivity tests. A meta-analysis of all three trials strengthened the evidence that homoeopathy does more than placebo (p = 0.0004). Is the reproducibility of evidence in favour of homoeopathy proof of its activity or proof of the clinical trial's capacity to produce false-positive results?
Systematic review of the effectiveness of stage based interventions to promote smoking cessation. Riemsma RP, Pattenden J, Bridle C, Sowden AJ, Mather L, Watt IS, Walker A. BMJ 2003: 326(7400); 1175-1177. [Abstract] [Full text] [PDF] Objective To evaluate the effectiveness of interventions using a stage based approach in bringing about positive changes in smoking behaviour. Design Systematic review. Data sources 35 electronic databases, catalogues, and internet resources (from inception to July 2002). Bibliographies of retrieved references were scanned for other relevant publications, and authors were contacted if necessary. Results 23 randomised controlled trials were reviewed; two reported details of an economic evaluation. Eight trials reported effects in favour of stage based interventions, three trials showed mixed results, and 12 trials found no statistically significant differences between a stage based intervention and a non-stage based intervention or no intervention. Eleven trials compared a stage based intervention with a non-stage based intervention, and one reported statistically significant effects in favour of the stage based intervention. Two studies reported mixed effects, and eight trials reported no statistically significant differences between groups. The methodological quality of the trials was mixed, and few reported any validation of the instrument used to assess participants' stage of change. Overall, the evidence suggests that stage based interventions are no more effective than non-stage based interventions or no intervention in changing smoking behaviour. Conclusions Limited evidence exists for the effectiveness of stage based interventions in changing smoking behaviour.
A Meta-analysis of the Effects of Ipratropium Bromide in Adults with Acute Asthma. Rodrigo G, Rodrigo C, Burschtin O. The American Journal of Medicine 1999: 107; 363-369. Abstract not available yet.
Systematic review of randomised controlled trials of over the counter cough medicines for acute cough in adults. Schroeder K, Fahey T. British Medical Journal 2002: 324(7333); 329-. [Abstract] [Full text] [PDF] Objectives: To determine whether over the counter cough medicines are effective for acute cough in adults. Design: Systematic review of randomised controlled trials. Data sources: Search of the Cochrane Acute Respiratory Infections Group specialised register, Cochrane Controlled Trials Register, Medline, Embase, and the UK Department of Health National Research Register in all languages. Included studies: All randomised controlled trials that compared oral over the counter cough preparations with placebo in adults with acute cough due to upper respiratory tract infection in ambulatory settings and that had cough symptoms as an outcome. Results: 15 trials involving 2166 participants met all the inclusion criteria. Antihistamines seemed to be no better than placebo. There was conflicting evidence on the effectiveness of antitussives, expectorants, antihistamine-decongestant combinations, and other drug combinations compared with placebo. Conclusion: Over the counter cough medicines for acute cough cannot be recommended because there is no good evidence for their effectiveness. Even when trials had significant results, the effect sizes were small and of doubtful clinical relevance. Because of the small number of trials in each category, the results have to be interpreted cautiously.
Meta-analysis of increased dose of inhaled steroid or addition of salmeterol in symptomatic asthma (MIASMA). Shrewsbury S, Pyke S, Britton M. BMJ 2000: 320; 1368-1373. Abstract not available yet.
Have sperm densities declined? A reanalysis of global trend data. Swan S, Elkin E, Fenster L. Environ Helath Perspect 1997: 105(11); 1228-32. ABSTRACT: In 1992 a worldwide decline in sperm density was reported; this was quickly followed by numerous critiques and editorials. Because of the public health importance of this finding, a detailed reanalysis of data from 61 studies was warranted to resolve these issues. Multiple linear regression models (controlling for abstinence time, age, percent proven fertility, specimen collection method, study goal and location) were used to examine regional differences and the interaction between region (United States, Europe, and non-Western countries) and year. Nonlinear models and residual confounding were also examined in these data. Using a linear model (adjusted R2 = 0. 80), means and slopes differed significantly across regions (p = 0. 02). Mean sperm densities were highest in Europe and lowest in non-Western countries. A decline in sperm density was seen in the United States (studies from 1938-1988; slope = -1.50; 95% confidence interval (CI), -1.90--1.10) and Europe (1971-1990; slope = -3.13; CI, -4.96- -1.30), but not in non-Western countries (1978-1989; slope = 1.56; CI, -1.00-4.12). Results from nonlinear models (quadratic and spline) were similar. Thus, further analysis of these studies supports a significant decline in sperm density in the United States and Europe. Confounding and selection bias are unlikely to account for these results. However, some intraregional differences were as large as mean decline in sperm density between 1938 and 1990, and recent reports from Europe and the United States further support large interarea differences in sperm density. Identifying the cause(s) of these regional and temporal differences, whether environmental or other, is clearly warranted.
Systematic review of dietary intervention trials to lower blood total cholesterol in free-living subjects. Tang JL, Armitage JM, Lancaster T, Silagy CA, Fowler GH, Neil HAW. Bmj 1998: 316(7139); 1213-20. [Abstract] [Full text] OBJECTIVES: To estimate the efficacy of dietary advice to lower blood total cholesterol concentration in free-living subjects and to investigate the efficacy of different dietary recommendations. DESIGN: Systematic overview of 19 randomised controlled trials including 28 comparisons. SUBJECTS: Free-living subjects. INTERVENTIONS: Individualised dietary advice to modify fat intake. MAIN OUTCOME MEASURE: Percentage difference in blood total cholesterol concentration between the intervention and control groups. RESULTS: The percentage reduction in blood total cholesterol attributable to dietary advice after at least six months of intervention was 5.3% (95% confidence interval 4.7% to 5.9%). Including both short and long duration studies, the effect was 8.5% at 3 months and 5.5% at 12 months. Diets equivalent to the step 2 diet of the American Heart Association were of similar efficacy to diets that aimed to lower total fat intake or to raise the polyunsaturated to saturated fatty acid ratio. These diets were moderately more effective than the step 1 diet of the American Heart Association (6.1% v 3.0% reduction in blood total cholesterol concentration; P<0.0001). On the basis of reported food intake, the targets for dietary change were seldom achieved. The observed reductions in blood total cholesterol concentrations in the individual trials were consistent with those predicted from dietary intake on the basis of the Keys equation. CONCLUSIONS: Individualised dietary advice for reducing cholesterol concentration is modestly effective in free-living subjects. More intensive diets achieve a greater reduction in serum cholesterol concentration. Failure to comply fully with dietary recommendations is the likely explanation for this limited efficacy.
Meta-analysis of Postnatal Steroid Use Challenged. Taylor C, Prakeshkumar S. Shah, Michael S. Dunn, and Keith J. Barrington. Pediatics Journal 2002 (April): 109(4); 716-717. Abstract not available.
Local opinion leaders: effects on professional practice and health care outcomes. Thomson OBMA, Oxman AD, Haynes RB, Davis DA, Freemantle N, Harvey EL. Cochrane 2000: (2); CD000125. BACKGROUND: Both the theory of diffusion of innovations and the social influences model of behaviour change suggest that using local opinion leaders to transmit norms and model appropriate behaviour may improve health professional practice. OBJECTIVES: To assess the effects using local opinion leaders on the practice of health professionals or patient outcomes. SEARCH STRATEGY: We searched MEDLINE to May 1998, the Research and Development Resource Base in Continuing Medical Education, and reference lists of related systematic reviews and articles. SELECTION CRITERIA: Randomised trials of the use of local opinion leaders (defined as health professionals nominated by their colleagues as being educationally influential). The participants were health care professionals responsible for patient care. DATA COLLECTION AND ANALYSIS: Two reviewers independently extracted data and assessed study quality. MAIN RESULTS: Eight studies were included involving more than 296 health professionals. A variety of patient problems were targeted, including acute myocardial infarction, cancer pain, osteoarthritis, rheumatoid arthritis, chronic lung disease, vaginal birth after caesarean section, labour and delivery, and urinary catheter care. Six of seven trials that measured health professional practice demonstrated some improvement for at least one outcome variable, and in two trials, the results were statistically significant and clinically important. In three trials that measured patient outcomes, only one achieved an impact upon practice that was of practical importance: local opinion leaders were effective in improving the rate of vaginal birth after previous caesarean section. REVIEWER'S CONCLUSIONS: Using local opinion leaders results in mixed effects on professional practice. However, it is not always clear what local opinion leaders do and replicable descriptions are needed. Further research is required to determine if opinion leaders can be identified and in which circumstances they are likely to influence the practice of their peers.
Maintenance Treatment with Interferon in Multiple Myeloma: A Survival Meta-Analysis. Trippoli S. Clin. Drug Invest. 1997: 14(5); 392-399. Abstract not available yet.
Techniques for surgical retrieval of sperm prior to ICSI for azoospermia (Cochrane Review) [In Process Citation]. Van Perperstraten AM, Proctor ML, Phillipson G, Johnson NP. Cochrane Database Syst Rev 2001: 4; [Medline] BACKGROUND: Azoospermia, the absence of sperm in ejaculated semen, is the most severe form of male factor infertility and is present in approximately 5% of all investigated infertile couples. The condition is currently classified as "obstructive" or "non-obstructive", although it is important to also consider the specific aetiology of each individual case. Some cases of obstructive azoospermia are treatable using microsurgical reconstruction of the seminal tract (for example, vasectomy reversal). Unreconstructable obstructive azoospermia and non-obstructive azoospermia have historically been relatively untreatable conditions that required the use of donor spermatozoa for fertilisation. The advent of intra-cytoplasmic sperm injection (ICSI), however, has transformed treatment of this type of severe male factor infertility. Sperm can be retrieved for ICSI from either the epididymis or the testis depending on the type of azoospermia. OBJECTIVES: To evaluate the efficacy of the various surgical retrieval techniques for men with obstructive or non obstructive azoospermia prior to ICSI. SEARCH STRATEGY: Electronic searches of the Cochrane Menstrual Disorders and Subfertility Group specialised register of controlled trials, CCTR, MEDLINE, EMBASE, and Bio extracts were performed to identify relevant randomised controlled trials (RCTs). Attempts were also made to identify trials from the National Research Register, the Clinical Trial Register and the citation lists of review articles and included trials. The first or corresponding author of each included trial was also contacted for additional information. SELECTION CRITERIA: Trials were included if they were randomised and compared the effectiveness of sperm retrieval techniques in men with azoospermia prior to ICSI. Due to the lack of RCTs non-randomised trials, who used the participants as their own control, were also considered in the review but not included in the meta-analysis. Trials of surgically extracted sperm versus ejaculated sperm or of diagnostic biopsies with no sperm parameter information were excluded. DATA COLLECTION AND ANALYSIS: One RCT was included in this systematic review which compared micropuncture with nerve stimulation versus microsurgical epididymal sperm extraction. Pregnancy rate, sperm retrieval adequate for ICSI and fertilisation rate were primary outcomes. Another RCT comparing microsurgical epididymal sperm extraction versus testicular sperm extraction, was excluded from the meta-analysis due to poor randomisation. Seven non-randomised comparative trials were also identified and included. Main outcomes were pregnancy rate, sperm retrieval adequate for ICSI, fertilisation rate and implantation rate. Quality assessment and data extraction were performed independently by two reviewers. Meta-analysis was performed using odds ratios for dichotomous outcomes and weighted mean differences for continuous outcomes. Data unsuitable for meta-analysis was reported as descriptive data and was also included for discussion. MAIN RESULTS: One small RCT comparing two epididymal techniques gave limited evidence that microsurgical epididymal sperm aspiration (MESA) achieved significantly lower pregnancy (OR 0.19, 95% CI 0.04-0.83) and fertilisation rates (OR 0.16, 95% CI 0.05-0.48) than the micropuncture with perivascular nerve stimulation technique. However the small number of participants included and the questionable methodology of this RCT make it impossible to make a definitive statement about the relative merits of either treatment. REVIEWER'S CONCLUSIONS: There is insufficient evidence to recommend any specific sperm retrieval technique for azoospermic men undergoing ICSI. Further randomised trials are warranted, preferably multi-centred trials.
A meta-analysis of acupuncture techniques for smoking cessation. White AR, Resch K-L, Ernst E. Tobacco Control 1999: 8; 393-397. Abstract not available yet.
Risk of acquiring Creutzfeldt-Jakob disease from blood transfusions: systematic review of case-control studies. Wilson K. British Medical Journal 2000: 321; 17-19. Abstract not available yet.
Genetic polymorphisms of debrisoquine and S-mephenytoin oxidation metabolism in Chinese populations: a meta-analysis. Xie HG, Xu ZH, Luo X, Huang SL, Zeng FD, Zhou HH. Pharmacogenetics 1996: 6(3); 235-8. [Medline] Chinese data on the polymorphic metabolism of debrisoquine, metoprolol, codeine and mephenytoin were collected and re-analysed using a meta-analysis method. There were no significant differences in the incidences of poor metabolizer (PM) between the separate series of debrisoquine, metoprolol and codeine, which are the three probe drugs reflecting the same enzyme polymorphism. PMs were detected at low frequencies for debrisoquine (1.20%; 95% confidence interval, CI: 0.67-1.98%), metoprolol (0.72%; CI: 0.29-1.49%) and codeine (0.48%, CI: 0.01-2.68%). The overall estimate of PM was 0.95% (CI: 0.60-1.42%) based on the 2427 determinations of all three probe drugs. The overall mean of PM of mephenytoin was 14.32% (12.26-16.38%) in the 1117 subjects. In summary, the present meta-analysis determined the accurate incidences of the genetic deficiency of S-mephenytoin 4'-hydroxylase (cytochrome P450 2C19) and debrisoquine hydroxylase (cytochrome P450 2D6) in Chinese populations.
Pediatric cardiopulmonary resuscitation: a collective review. Young KD, Seidel JS. Ann Emerg Med 1999: 33(2); 195-205. Little information is available about the effects of CPR in children, although it is known that the outcomes are dismal. Examples of unanswered questions include which advanced life support (ALS) procedures should be performed out-of-hospital, whether high-dose epinephrine improves survival, and the true prevalence of ventricular fibrillation as a presenting rhythm. Children differ from adults as to the cause and pathophysiology of cardiopulmonary arrest, but prehospital EMS and hospital resuscitation teams were initially designed for the care of adults. Because pediatric cardiopulmonary arrest is rare, prospective data are difficult to gather, and there are few large published studies. The purpose of this collective review was to review the current body of knowledge regarding survival rates and outcomes in pediatric CPR and, based on this review, to outline a course for future research.
Are coffee and tea consumption associated with urinary tract cancer risk? A systematic review and meta-analysis. Zeegers MP, Tan FE, Goldbohm RA, van den Brandt PA. Int J Epidemiol 2001: 30(2); 353-62. BACKGROUND: Narrative reviews have concluded that there is a small association between coffee consumption and an increased risk of urinary tract cancer, possibly due to confounding by smoking. No association for tea consumption has been indicated. This systematic review attempts to summarize and quantify these associations both unadjusted and adjusted for age, smoking and sex. METHOD: Thirty-four case-control and three follow-up studies were included in this systematic review. Summary odds ratios (OR) were calculated by meta-regression analyses. RESULTS: The unadjusted summary OR indicated a small increased risk of urinary tract cancer for current coffee consumers versus non-drinkers. The adjusted summary OR were: 1.26 (95% CI: 1.09-1.46) for studies with only men, 1.08 (95% CI: 0.79-1.46) for studies with only women and 1.18 (95% CI: 1.01-1.38) for studies with men and women combined. Neither unadjusted nor adjusted summary OR provided evidence for a positive association between tea consumption and urinary tract cancer. Even though studies differed in methodology, the results were rather consistent. We did not perform dose-response analyses for coffee and tea consumption due to sparse data. CONCLUSIONS: In accordance with earlier reviews, we found that coffee consumption increases the risk of urinary tract cancer by approximately 20%. The consumption of tea seems not to be related to an increased risk of urinary tract cancer.
Randomised controlled trial of homeopathy versus placebo in perennial allergic rhinitis with overview of four trial series. Taylor MA, Reilly D, Llewellyn-Jones RH, McSharry C, Aitchison TC. BMJ 2000: 321; 471-476. [Medline] [Abstract] [Full text] [PDF] Objective: To test the hypothesis that homoeopathy is a placebo by examining its effect in patients with allergic rhinitis and so contest the evidence from three previous trials in this series. Design: Randomised, double blind, placebo controlled, parallel group, multicentre study. Setting: Four general practices and a hospital ear, nose, and throat outpatient department. Participants: 51 patients with perennial allergic rhinitis. Intervention: Random assignment to an oral 30c homoeopathic preparation of principal inhalant allergen or to placebo. Main outcome measures: Changes from baseline in nasal inspiratory peak flow and symptom visual analogue scale score over third and fourth weeks after randomisation. Results: Fifty patients completed the study. The homoeopathy group had a significant objective improvement in nasal airflow compared with the placebo group (mean difference 19.8 l/min, 95% confidence interval 10.4 to 29.1, P=0.0001). Both groups reported improvement in symptoms, with patients taking homoeopathy reporting more improvement in all but one of the centres, which had more patients with aggravations. On average no significant difference between the groups was seen on visual analogue scale scores. Initial aggravations of rhinitis symptoms were more common with homoeopathy than placebo (7 (30%) v 2 (7%), P=0.04). Addition of these results to those of three previous trials (n=253) showed a mean symptom reduction on visual analogue scores of 28% (10.9 mm) for homoeopathy compared with 3% (1.1 mm) for placebo (95% confidence interval 4.2 to 15.4, P=0.0007). Conclusion: The objective results reinforce earlier evidence that homoeopathic dilutions differ from placebo.
Re-analysis of previous meta-analysis of clinical trials of homeopathy. Ernst E, Pittler MH. J Clin Epidemiol 2000: 53(11); 1188. Abstract not available.
Review of randomized trials of homoeopathy. Hill C, Doyon F. Rev Epidemiol Sante Publique 1990: 38(2); 139-47. [Medline] The present review covers forty published randomized trials in which the results of a homoeopathic treatment were compared to those of a standard treatment, a placebo, or no treatment at all. These trials were identified after an extensive search through the literature. They cover a wide range of pathologies. Most were double-blind and used subjective and/or multiple endpoints. The median number of patients per group was 28. The analysis only included all the randomized patients in one third of the trials. In our opinion, the results do not provide acceptable evidence that homoeopathic treatments are effective.
Meta-analysis of studies of the CYP2D6 polymorphism in relation to lung cancer and Parkinson's disease. Rostami-Hodjegan A, Lennard MS, Woods HF, Tucker GT. Pharmacogenetics 1998: 8(3); 227-38. Studies of associations between the CYP2D6 polymorphism and susceptibility to specific diseases, particularly lung cancer and Parkinsonism, have produced conflicting results with respect to an under or overrepresentation of poor metabolizers. Accordingly, we have re-evaluated this primary research (18 studies on lung cancer and 18 on Parkinsonism) using meta-analysis. For lung cancer, the median odds ratio (OR) was 0.69 (95% confidence interval (CI) 0.52-0.90), which differed significantly from unity (P < 0.007). A trail comprising 3000 patient and an equal number of control individuals would be required to demonstrate that this observation had arisen purely by chance (i.e. OR = 1). For Parkinson's disease, the analysis gave an OR of 1.32 (95% CI 0.98-1.78), which was of borderline statistical significance (P < 0.074). If the only individual study that was statistically significant was excluded, the P-value increased greatly to 0.489. A study of at least 500 patients and an equal number of control individuals giving the same value as the current mean OR of 1.32 would be required to make the overall analysis statistically significant. In summary, poor metabolizers with respect to CYP2D6 show a small decrease in susceptibility to lung cancer compared with extensive metabolizers and its is hard to justify further studies. The relationship between the CYP2D6 polymorphism and lung cancer, as a determinant of individual susceptibility, is not appreciable (OR = 0.69) compared with that between smoking and lung cancer (OR > 11). Nevertheless, the epidemiological impact on the number of poor metabolizers who are protected from lung cancer may be considerable. With regard to Parkinson's disease, additional well designed studies may allow a definitive conclusion, although any risk for poor metabolizers is likely to be small and therefore of questionable clinical significance. An important lesson from the current review of studies is that much time, effort, expense and patient inconvenience might have been avoid if more attention had been paid to appropriate study design particularly in the selection of control groups.
Multidrug resistance in breast cancer: a meta-analysis of MDR1/gp170 Expression and its possible functional significance. Trock B, Leonessa F, Clarke R. Journal of the National Cancer Institute 1997: 89(13); 917-31. ABSTRACT: BACKGROUND: P-glycoprotein (gp170; encoded by the MDR1 gene [also known as PGY1]) is a membrane protein capable of exporting a variety of anticancer drugs from cells. MDR1/gp170 expression has been studied in breast cancer, but the prevalence of this expression and its role in breast tumor drug resistance are unclear. PURPOSE: We conducted a critical review and meta-analysis of studies examining MDR1/gp170 expression in breast cancer to estimate the likely prevalence and clinical relevance of this expression. We also explored reasons for differences in the findings from individual studies. METHODS: Published papers on MDR1/gp170 expression in breast cancer were identified by searching several literature databases and reviewing the bibliographies of identified papers. Variability across the studies in the proportion of tumors expressing MDR1/gp170 was assessed by use of chi-squared tests of homogeneity, weighted means, and weighted linear regression. Pooled relative risks (RRs) for the association between the induction of MDR1/gp170 expression and prior chemotherapy and associations between MDR1/gp170 expression and several clinical outcomes were estimated by use of Mantel-Haenszel methods. Heterogeneity among the pooled RRs was explored by use of chi-squared tests. Reported P values are two-sided. RESULTS: Thirty-one studies were identified and evaluated. The proportion of breast tumors expressing MDR1/gp170 in all of the studies was 41.2%, but there was substantial heterogeneity in the values across individual studies (P<.0001). Regression analyses demonstrated that a considerable portion of the observed heterogeneity was a consequence of the change, over time, from RNA hybridization-based assays to immunohistochemistry-based assays of MDR1/gp170 expression. Measuring MDR1/gp170 expression before versus after chemotherapy and use of cytotoxic drugs that are not substrates for gp170 also contributed to the heterogeneity. Treatment with chemotherapeutic drugs or hormonal agents was associated with an increase in the proportion of tumors expressing MDR1/gp170 (RR = 1.77; 95% confidence interval [CI] = 1.46-2.15). Patients with tumors expressing MDR1/gp170 were three times more likely to fail to respond to chemotherapy than patients whose tumors were MDR1/gp170 negative (RR = 3.21; 95% CI = 2.28-4.51); this RR increased to 4.19 (95% CI = 2.71-6.47) when considering only patients whose tumor expression of MDR1/gp170 was measured after chemotherapy. MDR1/gp170 expression was not associated with lymph node metastases, estrogen receptor status, tumor size, tumor grade, or tumor histology. CONCLUSIONS AND IMPLICATIONS: MDR1/gp170 expression in breast tumors is associated with treatment and with a poor response to chemotherapy. The data are consistent with a contributory role for MDR1/gp170 in the multidrug resistance in some breast tumors.
A meta-analysis of 61 sperm count studies revisited. Becker S, Berhane K. Fertility and Sterility 1997: 67(6); 1103-8. [Medline] OBJECTIVE: To re-examine data on sperm counts over time from 61 studies from around the world. DESIGN: Parametric analyses and flexible nonlinear models of the relation between sperm counts and time. MAIN OUTCOME MEASURE(S): Mean sperm concentrations per milliliter and regression coefficients for possible trends of concentrations over time. RESULT(S): A significant decline was found only in U.S. studies. CONCLUSION(S): Studies from specific sites have found declines in sperm counts, but a world-wide decline has not been demonstrated. Rigorous assessment of statistical models should be done before conclusions are drawn. Flexible smoothing models are a useful addition to currently available analytic methods.
Sperm function assays and their predictive value for fertilization outcome in IVF therapy: a meta-analysis. Oehninger S, Franken DR, Sayed E, Barroso G, Kolm P. Hum Reprod Update 2000: 6(2); 160-8. [Medline] [PDF] The prevalence of male infertility and the availability of new, highly successful therapeutic options make the testing of sperm functional competence mandatory. An objective, outcome-based examination of the validity of the currently available assays was performed based upon the results obtained from 2906 subjects evaluated in 34 prospectively designed, controlled studies. The aim was carried out through a meta-analytical approach that examined the predictive value of four categories of sperm functional assays: computer-aided sperm motion analysis (CASA); induced-acrosome reaction testing; sperm penetration assay (SPA); and sperm-zona pellucida binding assays for IVF outcome. Results demonstrated a high predictive power of the sperm-zona pellucida binding and the induced-acrosome reaction assays for fertilization outcome. On the other hand, the findings indicated a poor clinical value of the SPA as predictor of fertilization and a real need for standardization and further investigation of the potential clinical utility of CASA systems. This analysis points out to limitations of the current tests and the need for standardization of methodologies and provides objective evidence on which clinical management and future research can be based.
Cigarette smoking and sperm density: a meta-analysis. Vine MF, Margolin BH, Morrison HI, Hulka BS. Fertil Steril 1994: 61(1); 35-43. [Medline] OBJECTIVE: To quantify, through meta-analysis techniques, the association between cigarette smoking and sperm density. METHODS: The logarithm of the ratio of mean sperm density for smokers to that for nonsmokers for the studies included in this meta-analysis was regressed against a constant, an indicator of study population source (infertility clinic patients or normal men), minimum number of cigarettes smoked per day among smokers (< 10, > or = 10), exclusion of azoospermic men (yes/no), number of semen specimens analyzed (one versus two), and blinding of laboratory personnel to the smoking status of the study participants (yes/no). Regression analyses were performed both unweighted and weighted inversely by study size. A qualitative and quantitative assessment of the relationship between the numbers of cigarettes smoked per day and sperm density was performed. RESULTS: Results of the meta-analysis indicate that smokers' sperm density is on average 13% to 17% (95% confidence interval = 8.0, 21.5) lower than that of nonsmokers. No other factors besides cigarette smoking were found to be independent predictors of sperm density. No clear dose-response relationships between the numbers of cigarettes smoked per day and sperm density emerged. Research conducted by the authors supports the findings of the meta-analysis. CONCLUSION: Cigarette smoking is associated with lowered sperm density. The inconsistency in the literature with regard to this conclusion appears to be the result of small sample sizes in most studies.
Environmental lead and children's intelligence: a systematic review of the epidemiological evidence. Pocock SJ, Smith M, Baghurst P. BMJ 1994: 309(6963); 1189-1197. [Medline] [Abstract] [Full text] Abstract Abstract Objective: To quantify the magnitude of the relation between full scale IQ in children aged 5 or more and their body burden of lead. Design: A systematic review of 26 epidemiological studies since 1979: prospective studies of birth cohorts, cross sectional studies of blood lead, and cross sectional studies of tooth lead. Setting: General populations of children >=5 years. Main outcome measures - For each study, the regression coefficient of IQ on lead, after adjustment for confounders when possible, was used to derive the estimated change in IQ for a specific doubling of either blood or tooth lead. Results: The five prospective studies with over 1100 children showed no association of cord blood lead or antenatal maternal blood lead with subsequent IQ. Blood lead at around age 2 had a small and significant inverse association with IQ, somewhat greater than that for mean blood lead over the preschool years. The 14 cross sectional studies of blood lead with 3499 children showed a significant inverse association overall, but showed more variation in their results and their ability to allow for confounders. The seven cross sectional studies of tooth lead with 2095 children were more consistent in finding an inverse association, although the estimated magnitude was somewhat smaller. Overall synthesis of this evidence, including a meta-analysis, indicates that a typical doubling of body lead burden (from 10 to 20 {micro}g/dl (0.48 to 0.97 {micro}mol/l) blood lead or from 5 to 10 {micro}g/g tooth lead) is associated with a mean deficit in full scale IQ of around 1-2 IQ points. Conclusion: While low level lead exposure may cause a small IQ deficit, other explanations need considering: are the published studies representative; is there inadequate allowance for confounders; are there selection biases in recruiting and following children; and do children of lower IQ adopt behaviour which makes them more prone to lead uptake (reverse causality)? Even if moderate increases in body lead burden adversely affect IQ, a threshold below which there is negligible influence cannot currently be determined. Because of these uncertainties, the degree of public health priority that should be devoted to detecting and reducing moderate increases in children's blood lead, compared with other important social detriments that impede children's development, needs careful consideration. Public health implications Public health implications Early (neonatal) lead exposure seems not to affect child IQ in the general population Blood lead and tooth lead measures during the first few years of life show a weak, but highly significant, inverse association with child IQ at ages 5 upwards At face value, it seems that a typical doubling of body lead burden is linked to a loss of 1-2 IQ points Given that these are observational studies, the extent to which lead actually causes an IQ deficit in the general population of children inevitably remains open to debate This overall quantification of the lead-IQ association will help in determining public health policy in limiting children's exposure to environmental lead
Are cannabinoids an effective and safe treatment option in the management of pain? A qualitative systematic review. Campbell FA, Tramer MR, Carroll D, Reynolds DJ, Moore RA, McQuay HJ. British Medical Journal 2001: 323(7303); 13-6. [Medline] [Abstract] [Full text] [PDF] OBJECTIVE: To establish whether cannabis is an effective and safe treatment option in the management of pain. DESIGN: Systematic review of randomised controlled trials. DATA SOURCES: Electronic databases Medline, Embase, Oxford Pain Database, and Cochrane Library; references from identified papers; hand searches. STUDY SELECTION: Trials of cannabis given by any route of administration (experimental intervention) with any analgesic or placebo (control intervention) in patients with acute, chronic non-malignant, or cancer pain. Outcomes examined were pain intensity scores, pain relief scores, and adverse effects. Validity of trials was assessed independently with the Oxford score. DATA EXTRACTION: Independent data extraction; discrepancies resolved by consensus. DATA SYNTHESIS: 20 randomised controlled trials were identified, 11 of which were excluded. Of the 9 included trials (222 patients), 5 trials related to cancer pain, 2 to chronic non-malignant pain, and 2 to acute postoperative pain. No randomised controlled trials evaluated cannabis; all tested active substances were cannabinoids. Oral delta-9-tetrahydrocannabinol (THC) 5-20 mg, an oral synthetic nitrogen analogue of THC 1 mg, and intramuscular levonantradol 1.5-3 mg were about as effective as codeine 50-120 mg, and oral benzopyranoperidine 2-4 mg was less effective than codeine 60-120 mg and no better than placebo. Adverse effects, most often psychotropic, were common. CONCLUSION: Cannabinoids are no more effective than codeine in controlling pain and have depressant effects on the central nervous system that limit their use. Their widespread introduction into clinical practice for pain management is therefore undesirable. In acute postoperative pain they should not be used. Before cannabinoids can be considered for treating spasticity and neuropathic pain, further valid randomised controlled studies are needed.
Detecting the effects of thromboprophylaxis: the case of the rogue reviews. Petticrew M, Kennedy SC. British Medical Journal 1997: 315(7109); 665-8. Abstract not available.
Meta-analysis/Shmeta-analysis. Shapiro S. Am J Epidemiol 1994: 140(9); 771-8. Abstract not available.
Questions for meta-analysis. Sohn D. Psychol Rep 1997: 81(1); 3-15. In spite of an abundance of data, the empirical evidence as yet does not make clear whether meta-analysis will bring about progress in psychological science. Therefore, it is still useful and desirable to engage in rational analysis of the methodology. Such analysis is done in the present essay by posing five questions that go to the logical and conceptual foundation of meta-analysis. The questions are (a) What are the grounds for believing that the review of the literature, even a quantitative one, will bring about scientific discovery? (b) Why is the individual study devalued when the history of successful science seems largely the story of the success of the individual study? (c) What is the rationale for believing that data analysis by itself can markedly improve the fortunes of psychological science? (d) Is there a basis for claims made on behalf of meta-analysis that it is more accurate than either the traditional literature review or the individual study? (e) Is there justification for the claim that de facto meta-analysis has been used effectively in physical science?
Acetylcysteine for prevention of contrast-induced nephropathy after intravascular angiography: a systematic review and meta-analysis. Bagshaw SM, Ghali WA. BMC Med 2004: 2(1); 38. [Medline] [Abstract] [Full text] [PDF] BACKGROUND: Contrast-induced nephropathy is an important cause of acute renal failure. We assess the efficacy of acetylcysteine for prevention of contrast-induced nephropathy among patients undergoing intravascular angiography. METHODS: We conducted a systematic review and meta-analysis of randomized controlled trials comparing prophylactic acetylcysteine plus hydration versus hydration alone in patients undergoing intravascular angiography. Studies were identified by searching MEDLINE, EMBASE, and CENTRAL databases. Our main outcome measures were the risk of contrast-induced nephropathy and the difference in serum creatinine between acetylcysteine and control groups at 48 h. RESULTS: Fourteen studies involving 1261 patients were identified and included for analysis, and findings were heterogeneous across studies. Acetylcysteine was associated with a significantly reduced incidence of contrast-induced nephropathy in five studies, and no difference in the other nine (with a trend toward a higher incidence in six of the latter studies). The pooled odds ratio for contrast-induced nephropathy with acetylcysteine relative to control was 0.54 (95% CI, 0.32-0.91, p = 0.02) and the pooled estimate of difference in 48-h serum creatinine for acetylcysteine relative to control was -7.2 mumol/L (95% CI -19.7 to 5.3, p = 0.26). These pooled values need to be interpreted cautiously because of the heterogeneity across studies, and due to evidence of publication bias. Meta-regression suggested that the heterogeneity might be partially explained by whether the angiography was performed electively or as emergency. CONCLUSION: These findings indicate that published studies of acetylcysteine for prevention of contrast-induced nephropathy yield inconsistent results. The efficacy of acetylcysteine will remain uncertain unless a large well-designed multi-center trial is performed.
Bias in meta-analysis detected by a simple, graphical test. Egger M, Davey Smith G, Schneider M, Minder C. British Medical Journal 1997: 315(7109); 629-34. [Medline] [Abstract] [Full text] OBJECTIVE: Funnel plots (plots of effect estimates against sample size) may be useful to detect bias in meta-analyses that were later contradicted by large trials. We examined whether a simple test of asymmetry of funnel plots predicts discordance of results when meta-analyses are compared to large trials, and we assessed the prevalence of bias in published meta-analyses. DESIGN: Medline search to identify pairs consisting of a meta-analysis and a single large trial (concordance of results was assumed if effects were in the same direction and the meta-analytic estimate was within 30% of the trial); analysis of funnel plots from 37 meta-analyses identified from a hand search of four leading general medicine journals 1993-6 and 38 meta-analyses from the second 1996 issue of the Cochrane Database of Systematic Reviews. MAIN OUTCOME MEASURE: Degree of funnel plot asymmetry as measured by the intercept from regression of standard normal deviates against precision. RESULTS: In the eight pairs of meta-analysis and large trial that were identified (five from cardiovascular medicine, one from diabetic medicine, one from geriatric medicine, one from perinatal m