Children's Mercy Hospital
Find a Doctor | Press Room | Careers | Directions & Locations

About Us | Contact Us | Giving to Children's Mercy
For Patients and Families   Your Child's Health   Clinical Services   |   For Health Care Professionals   Medical Education   Medical Research

Miscellaneous resources for "Statistical Evidence."

This is a list of references that helped me in writing my book, "Statistical Evidence."

evidence >> apples >> adjustment (16)

Are sex and death related? Study failed to adjust for an important confounder [letter; comment]. David Batty. British Medical Journal 1998: 316(7145); 1671; discussion 1672. Abstract not available. [Full text]

Conditions for confounding of the risk ratio and of the odds ratio. J. F. Boivin, S. Wacholder. American Journal Epidemiology 1985: 121(1); 152-8. There are disagreements in the literature about the criteria to be used to ascertain whether or not a measure of association is confounded. The authors postulate the general principle that a crude unconfounded measure of association is structured as a weighted average of the stratum-specific values of the measure. They examine the relationships between stratum-specific measures of association, crude overall measures, and weighted averages of stratum-specific measures, and indicate how these relationships may be used to define criteria for the assessment of confounding in cohort studies in which the exposure, disease, and stratification variables are classified dichotomously. The criteria presented differ for the risk ratio and for the disease-odds ratio. In other words, one can reach different conclusions about the confounding effect of a given extraneous variable, depending on which measure of association is chosen. This view differs from that of Miettinen and Cook (Confounding: essence and detection. Am J Epidemiol 1981;114:593-603) who postulated one set of criteria for the assessment of confounding, which was applicable to both measures of association. These different approaches may lead to different conclusions about the presence or absence of confounding. [Medline]

Maternal smoking and Down syndrome: the confounding effect of maternal age. C. L. Chen, T. J. Gilbert, J. R. Daling. Am J Epidemiol 1999: 149(5); 442-6. Inconsistent results have been reported from studies evaluating the association of maternal smoking with birth of a Down syndrome child. Control of known risk factors, particularly maternal age, has also varied across studies. By using a population-based case-control design (775 Down syndrome cases and 7,750 normal controls) and Washington State birth record data for 1984-1994, the authors examined this hypothesized association and found a crude odds ratio of 0.80 (95% confidence interval 0.65-0.98). Controlling for broad categories of maternal age (<35 years, > or =35 years), as described in prior studies, resulted in a negative association (odds ratio = 0.87, 95% confidence interval 0.71-1.07). However, controlling for exact year of maternal age in conjunction with race and parity resulted in no association (odds ratio = 1.00, 95% confidence interval 0.82-1.24). In this study, the prevalence of Down syndrome births increased with increasing maternal age, whereas among controls the reported prevalence of smoking during pregnancy decreased with increasing maternal age. There is a substantial potential for residual confounding by maternal age in studies of maternal smoking and Down syndrome. After adequately controlling for maternal age in this study, the authors found no clear relation between maternal smoking and the risk of Down syndrome.

Look before You Leap: Stratify before You Standardize. Bernard C.K. Choi. American Journal of Epidemiology 1999: 149(12); 1087-1095. ABSTRACT: This paper presents a mathematical model to show the conditions in which age standardization can be used to summarize age-specific rates for comparison purposes over calendar time. It shows that the conditions for valid comparison depend on the type of measure used for comparison, that is, difference, ratio, or percent change. If the measure for comparison is a difference of the standardized rates at two time points, then the age-specific rates need to maintain a constant rate difference over time for the comparison to be valid. If the measure for comparison is a ratio or percent change of the standardized rates at two time points, then the age-specific rates need to maintain a constant rate ratio over time for the comparison to be valid. Since in reality, as shown by our Canadian empirical data, age-specific rates do not always maintain a consistent pattern over time, it is recommended that one should always stratify the data to look at patterns of age-specific rates before applying age standardization.

Presenting statistical uncertainty in trends and dose-response relations. S Greenland, KB Michels, JM Robins, C Poole, WC Willett. AJE 1999: 149(12); 1077-86. ABSTRACT: When one estimates the effects of a polytomous exposure, it is common practice to express all effects relative to a baseline or reference level. Certain authors have challenged this practice and proposed alternatives, which we review here. One alternative, the "floating absolute risk" method, can supply useful statistics and trend graphs, but it does not yield valid confidence intervals for relative risks. All categorical methods have further shortcomings when the exposure is continuous, however. These shortcomings can be addressed by plotting or tabulating confidence limits for points on a flexible curve fitted to the uncategorized data.

Patient volume, staffing, and workload in relation to risk-adjusted outcomes in a random stratified sample of UK neonatal intensive care units: a prospective evaluation. The UK Neonatal Staffing Study Group. Lancet 2002: 35999-107. Background UK recommendations suggest that large neonatal intensive-care units (NICUs) have better outcomes than small units, although this suggestion remains unproven. We assessed whether patient volume, staffing levels, and workload are associated with risk-adjusted outcomes, and with costs or staff wellbeing. Methods 186 UK NICUs were stratified according to volume of patients, nursing provision, and neonatal consultant provision. Primary outcomes were hospital mortality, mortality or cerebral damage, and nosocomial bacteraemia. We studied 13 515 infants of all birthweights consecutively admitted to 54 randomly selected NICUs. Multiple logistic regression analyses were done with every primary outcome as the dependent variable. Staff wellbeing and stress were assessed by anonymous mental health index (MHI)-5 questionnaires. Findings Data were available for 13 334 (99%) infants. High-volume NICUs treated the sickest infants and had highest crude mortality. Risk-adjusted mortality and mortality or cerebral damage were unrelated to patient volume or staffing provision; however, nosocomial bacteraemia was less frequent in NICUs with low neonatal consultant provision (odds ratio 0·65, 95% CI 0·43-0·98). Mortality was raised with increasing workload in all types of NICUs. Infants admitted at full capacity versus half capacity were about 50% more likely to die, but there was wide uncertainty around this estimate. Most staff had MHI-5 scores that suggested good mental health. Interpretation The implications of this report for staffing policy, medicolegal risk management, and ethical practice remain to be tested. Centralisation of only the sickest infants could improve efficiency, provided that this does not create excessive workload for staff. Assessment of increased staffing levels that are closer to those in adult intensive care might be appropriate.

Causal Knowledge as a Prerequisite for Confounding Evaluation: An Application to Birth Defects Epidemiology. Miguel A. Hernán, Sonia Hernández-Díaz2, Martha M. Werler2 and Allen A. Mitchell2. Am. J of Epidemiology 2002: 155(2); 176-184. Common strategies to decide whether a variable is a confounder that should be adjusted for in the analysis rely mostly on statistical criteria. The authors present findings from the Slone Epidemiology Unit Birth Defects Study, 1992–1997, a case-control study on folic acid supplementation and risk of neural tube defects. When statistical strategies for confounding evaluation are used, the adjusted odds ratio is 0.80 (95% confidence interval: 0.62, 1.21). However, the consideration of a priori causal knowledge suggests that the crude odds ratio of 0.65 (95% confidence interval: 0.46, 0.94) should be used because the adjusted odds ratio is invalid. Causal diagrams are used to encode qualitative a priori subject matter knowledge.

Socioeconomic status and health in blacks and whites: the problem of residual confounding and the resiliency of race. J. S. Kaufman, R. S. Cooper, D. L. McGee. Epidemiology 1997: 8(6); 621-8. A large number of epidemiologic studies have focused on racial/ethnic differences, particularly between blacks and whites. Because health endpoints and racial categorizations are associated with socioeconomic status, investigators generally adjust for socioeconomic indicators. The intention is usually to control for confounding, thereby making groups comparable and excluding socioeconomic status as an alternative explanation to hypotheses of innate physiologic differences. A threat to the validity of these analyses is therefore the presence of residual confounding. We identify four potential sources of residual confounding in this analytical design: categorization of socioeconomic status variables, measurement error in socioeconomic indicators, use of aggregated socioeconomic status measures, and incommensurate socioeconomic indicators. Using simulations and examples from the literature, we demonstrate that the effect of residual confounding is to bias interpretation of data toward the conclusion of independent racial/ethnic group effects. Investigators often refer to possible "genetic" differences on the basis of models that control for socioeconomic status. We propose that such conclusions on the basis of this analytical strategy are generally unwarranted. Racial/ethnic differences in disease are a pressing public health concern, but the current approach does not often provide a basis for inference about putative biological factors in the etiology of this disparity.

META-ANALYSIS Dose-specific Meta-Analysis and Sensitivity Analysis of the Relation between Alcohol Consumption and Lung Cancer Risk. Jeffrey E. Korte, Paul Brennan, S. Jane Henley, Paolo Boffetta. Am. J of Epidemiology 2002: 155(6); 496-506. Alcohol drinking increases the risk of several types of cancer, but studies of the relation between alcohol and lung cancer risk are complicated by smoking. The authors carried out meta-analyses for four study designs and conducted sensitivity analyses to assess the results. Pooled smoking-unadjusted relative risks (RRs) for brewery workers and alcoholics were 1.17 (95% confidence interval (CI): 0.99, 1.39) and 1.99 (95% CI: 1.66, 2.39), respectively, relative to population rates. For cohort and case-control studies, the authors conducted dose-specific meta-analyses for ethanol consumption of 1–499, 500–999, 1,000–1,999, and 2,000 g/month, relative to nondrinking. Smoking-adjusted RRs for ascending dose groups in cohort studies were 0.98 (95% CI: 0.79, 1.21), 0.92 (95% CI: 0.81, 1.04), 1.04 (95% CI: 0.88, 1.22), and 1.53 (95% CI: 1.04, 2.25), respectively. Smoking-adjusted odds ratios for ascending groups in case-control studies were 0.63 (95% CI: 0.51, 0.78), 1.30 (95% CI: 0.98, 1.70), 1.13 (95% CI: 0.46, 2.75), and 1.86 (95% CI: 1.39, 2.49), respectively. Elevated odds ratios were seen for hospital-based case-control studies but not for population-based case-control studies. Sensitivity analyses indicated that smoking explained the elevated RRs in studies of alcoholics and that strong misclassification of smoking status could produce an elevated smoking-adjusted RR in cohort and case-control studies. Overall, evidence for a smoking-adjusted association between alcohol and lung cancer risk is limited to very high consumption groups in cohort and hospital-based case-control studies. At lower levels, any associations observed appear to be explained by confounding.

How do risk factors work together? Mediators, moderators, and independent, overlapping, and proxy risk factors. H. C. Kraemer, E. Stice, A. Kazdin, D. Offord, D. Kupfer. Am J Psychiatry 2001: 158(6); 848-56. OBJECTIVE: The authors developed a methodological basis for investigating how risk factors work together. Better methods are needed for understanding the etiology of disorders, such as psychiatric syndromes, that presumably are the result of complex causal chains. METHOD: Approaches from psychology, epidemiology, clinical trials, and basic sciences were synthesized. RESULTS: The authors define conceptually and operationally five different clinically important ways in which two risk factors may work together to influence an outcome: as proxy, overlapping, and independent risk factors and as mediators and moderators. CONCLUSIONS: Classifying putative risk factors into these qualitatively different types can help identify high-risk individuals in need of preventive interventions and can help inform the content of such interventions. These methods may also help bridge the gaps between theory, the basic and clinical sciences, and clinical and policy applications and thus aid the search for early diagnoses and for highly effective preventive and treatment interventions.

Mediators and moderators of treatment effects in randomized clinical trials. H. C. Kraemer, G. T. Wilson, C. G. Fairburn, W. S. Agras. Arch Gen Psychiatry 2002: 59(10); 877-83. (Covariate adjustment is important, even in randomized trials and can identify important subgroups and mechanisms of action.) Randomized clinical trials (RCTs) not only are the gold standard for evaluating the efficacy and effectiveness of psychiatric treatments but also can be valuable in revealing moderators and mediators of therapeutic change. Conceptually, moderators identify on whom and under what circumstances treatments have different effects. Mediators identify why and how treatments have effects. We describe an analytic framework to identify and distinguish between moderators and mediators in RCTs when outcomes are measured dimensionally. Rapid progress in identifying the most effective treatments and understanding on whom treatments work and do not work and why treatments work or do not work depends on efforts to identify moderators and mediators of treatment outcome. We recommend that RCTs routinely include and report such analyses.

Baseline imbalance in randomised controlled trials. C Roberts, DJ Torgerson. British Medical Journal 1999: 319(7203); 185. Abstract not available yet. [Medline] [Full text] [PDF]

Sex and death: are they related? Findings from the Caerphilly cohort study. GD Smith, S Frankel, J Yarnell. British Medical Journal 1997: 315(7123); 1641-1644. ABSTRACT: OBJECTIVE: To examine the relation between frequency of orgasm and mortality. STUDY DESIGN: Cohort study with a 10 year follow up. SETTING: The town of Caerphilly, South Wales, and five adjacent villages. SUBJECTS: 918 men aged 45-59 at time of recruitment between 1979 and 1983. MAIN OUTCOME MEASURES: All deaths and deaths from coronary heart disease. RESULTS: Mortality risk was 50% lower in the group with high orgasmic frequency than in the group with low orgasmic frequency, with evidence of a dose-response relation across the groups. Age adjusted odds ratio for all cause mortality was 2.0 for the group with low frequency of orgasm (95% confidence interval 1.1 to 3.5, test for trend P = 0.02). With adjustment for risk factors this became 1.9 (1.0 to 3.4, test for trend P = 0.04). Death from coronary heart disease and from other causes showed similar associations with frequency of orgasm, although the gradient was most marked for deaths from coronary heart disease. Analysed in terms of actual frequency of orgasm, the odds ratio for total mortality associated with an increase in 100 orgasms per year was 0.64 (0.44 to 0.95). CONCLUSION: Sexual activity seems to have a protective effect on men's health. [Medline] [Abstract] [Full text]

Clinical trials in acute myocardial infarction: Should we adjust for baseline characteristics? Ewout W. Steyerberg, Patrick M.M. Bossuyt, Kerry L. Lee. American Heart Journal 2000: 139(5); 745-751. ABSTRACT: BACKGROUND: Clinical trials concerning acute myocardial infarction often evaluate short-term death. Several baseline characteristics are predictors of death, most notably age. Adjustment for one or more predictors in a multivariable analysis may be considered to correct the estimate of the treatment effect for any imbalance that by chance may have occurred between the randomized groups. Moreover, adjustment results in a stratified estimate of the effect of treatment. METHODS AND RESULTS: The effects of adjustment (correction for imbalance and stratification) were studied with logistic regression analysis in the Global Use of Strategies to Open Occluded Coronary Arteries (GUSTO)-I trial. The primary end point was 30-day death, which occurred in 6.3% of 10,348 patients randomly assigned to tissue plasminogen activator and 7.3% of 20,162 patients randomly assigned to streptokinase thrombolytic therapy. This is equivalent to an unadjusted odds ratio of 0.853. No significant imbalance had occurred for any of 17 baseline characteristics considered, including well-known demographic, presenting, and history characteristics. Adjusted for age, the odds ratio was 0.829, which is an 18% increase in estimated effect on the logistic scale. When adjusted for 17 characteristics, the odds ratio was 0.820, an increase of 25%. The increase in effect estimate was largely explained by the stratification effect and only partly by imbalance of predictors. CONCLUSIONS: Adjustment for predictive baseline characteristics, even when largely balanced, may lead to clearly different estimates of the treatment effect on mortality rates. Adjustment for important predictors such as age is recommended in clinical trials studying patients with acute myocardial infarction.

Research Methods: Why Covariance? A Rationale for Using Analysis of Covariance Procedures in Randomized Studies. Matthew J. Taylor. Journal of Early Intervention 1993: 17(4); 455-466. Abstract not available yet.

A comparison of direct adjustment and regression adjustment of epidemiologic measures. T. C. Wilcosky, L. E. Chambless. J Chronic Dis 1985: 38(10); 849-56. Although regression adjustment can provide a useful alternative to direct adjustment, especially when data are sparse, many researchers are unaware that adjusted summary measures can be easily derived from regression coefficients. In a non-technical discussion with examples, the direct adjustment procedure is compared with three methods of regression adjustment based on analysis of covariance models: the conditional prediction method, the stratified prediction method, and the marginal prediction method. Both the stratified prediction and direct adjustment methods yield summary measures that are weighted averages of stratum-specific measures, while adjusted measures from the conditional prediction method are similar to stratum-specific estimates. In contrast to the other adjustment procedures, which can use internal or external weights, the marginal prediction method always gives an internally adjusted measure. Under certain conditions, the three regression adjustment procedures produce identical results. Major advantages of direct adjustment include computational simplicity and relatively few statistical assumptions. Regression adjustment, however, is more convenient for statistical tests for interactions and group differences, and often precludes the need to categorize continuous variables, so that problems with empty strata are avoided.

evidence >> apples >> casecontrol (13)

Reye's syndrome in the United States from 1981 through 1997. E. D. Belay, J. S. Bresee, R. C. Holman, A. S. Khan, A. Shahriari, L. B. Schonberger. New England Journal of Medicine 1999: 340(18); 1377-82. BACKGROUND: Reye's syndrome is characterized by encephalopathy and fatty degeneration of the liver, usually after influenza or varicella. Beginning in 1980, warnings were issued about the use of salicylates in children with those viral infections because of the risk of Reye's syndrome. METHODS: To describe the pattern of Reye's syndrome in the United States, characteristics of the patients, and risk factors for poor outcomes, we analyzed national surveillance data collected from December 1980 through November 1997. The surveillance system is based on voluntary reporting with the use of a standard case-report form. RESULTS: From December 1980 through November 1997 (surveillance years 1981 through 1997), 1207 cases of Reye's syndrome were reported in patients less than 18 years of age. Among those for whom data on race and sex were available, 93 percent were white and 52 percent were girls. The number of reported cases of Reye's syndrome declined sharply after the association of Reye's syndrome with aspirin was reported. After a peak of 555 cases in children reported in 1980, there have been no more than 36 cases per year since 1987. Antecedent illnesses were reported in 93 percent of the children, and detectable blood salicylate levels in 82 percent. The overall case fatality rate was 31 percent. The case fatality rate was highest in children under five years of age (relative risk, 1.8; 95 percent confidence interval, 1.5 to 2.1) and in those with a serum ammonia level above 45 microg per deciliter (26 micromol per liter) (relative risk, 3.4; 95 percent confidence interval, 1.9 to 6.2). CONCLUSIONS: Since 1980, when the association between Reye's syndrome and the use of aspirin during varicella or influenza-like illness was first reported, there has been a sharp decline in the number of infants and children reported to have Reye's syndrome. Because Reye's syndrome is now very rare, any infant or child suspected of having this disorder should undergo extensive investigation to rule out the treatable inborn metabolic disorders that can mimic Reye's syndrome. [Abstract] [Full text] [PDF]

A case-control study of HIV seroconversion in health care workers after percutaneous exposure. Centers for Disease Control and Prevention Needlestick Surveillance Group. D. M. Cardo, D. H. Culver, C. A. Ciesielski, P. U. Srivastava, R. Marcus, D. Abiteboul, J. Heptonstall, G. Ippolito, F. Lot, P. S. McKibben, D. M. Bell. N Engl J Med 1997: 337(21); 1485-90. BACKGROUND: The average risk of human immunodeficiency virus (HIV) infection after percutaneous exposure to HIV-infected blood is 0.3 percent, but the factors that influence this risk are not well understood. METHODS: We conducted a case-control study of health care workers with occupational, percutaneous exposure to HIV-infected blood. The case patients were those who became seropositive after exposure to HIV, as reported by national surveillance systems in France, Italy, the United Kingdom, and the United States. The controls were health care workers in a prospective surveillance project who were exposed to HIV but did not seroconvert. RESULTS: Logistic-regression analysis based on 33 case patients and 665 controls showed that significant risk factors for seroconversion were deep injury (odds ratio= 15; 95 percent confidence interval, 6.0 to 41), injury with a device that was visibly contaminated with the source patient's blood (odds ratio= 6.2; 95 percent confidence interval, 2.2 to 21), a procedure involving a needle placed in the source patient's artery or vein (odds ratio=4.3; 95 percent confidence interval, 1.7 to 12), and exposure to a source patient who died of the acquired immunodeficiency syndrome within two months afterward (odds ratio=5.6; 95 percent confidence interval, 2.0 to 16). The case patients were significantly less likely than the controls to have taken zidovudine after the exposure (odds ratio=0.19; 95 percent confidence interval, 0.06 to 0.52). CONCLUSIONS: The risk of HIV infection after percutaneous exposure increases with a larger volume of blood and, probably, a higher titer of HIV in the source patient's blood. Postexposure prophylaxis with zidovudine appears to be protective. [Abstract] [Full text] [PDF]

Reye's syndrome. M. Casteels-Van Daele, C. Van Geet, C. Wouters, E. Eggermont. Lancet 2001: 358(9278); 334. Abstract not available yet.

Risk of testicular cancer in subfertile men: case-control study. H. Moller, N. E. Skakkebaek. British Medical Journal 1999: 318(7183); 559-62. OBJECTIVE: To evaluate the association between subfertility in men and the subsequent risk of testicular cancer. DESIGN: Population based case-control study. SETTING: The Danish population. PARTICIPANTS: Cases were identified in the Danish Cancer Registry; controls were randomly selected from the Danish population with the computerised Danish Central Population Register. Men were interviewed by telephone; 514 men with cancer and 720 controls participated. OUTCOME MEASURE: Occurrence of testicular cancer. RESULTS: A reduced risk of testicular cancer was associated with paternity (relative risk 0.63; 95% confidence interval 0.47 to 0.85). In men who before the diagnosis of testicular cancer had a lower number of children than expected on the basis of their age, the relative risk was 1.98 (1.43 to 2.75). There was no corresponding protective effect associated with a higher number of children than expected. The associations were similar for seminoma and non-seminoma and were not influenced by adjustment for potential confounding factors. CONCLUSION: These data are consistent with the hypothesis that male subfertility and testicular cancer share important aetiological factors.

Testicular cancer risk in relation to use of disposable nappies. H. Moller. Arch Dis Child 2002: 86(1); 28-9. Information on the use of disposable nappies in childhood was available for 296 testicular cancer cases and 287 population controls in Denmark. No association was found between disposable nappy use and the subsequent risk of testicular cancer in adulthood.

The disappearance of Reye's syndrome--a public health triumph. A. S. Monto. N Engl J Med 1999: 340(18); p1423-4. Abstract not available.

Hospital controls versus community controls: differences in inferences regarding risk factors for hip fracture. D. J. Moritz, J. L. Kelsey, J. A. Grisso. Am J Epidemiol 1997: 145(7); 653-60. In case-control studies using cases identified from persons admitted to hospitals, two types of controls are most often used: persons from the communities served by the hospitals and persons admitted to the same hospitals as those to which the cases were admitted. It is often unclear which is the more appropriate choice, and whether the use of one or the other type of control group will lead to biased conclusions. The purpose of the present analysis was to determine whether the choice of hospital controls versus community controls would influence conclusions regarding risk factors for hip fracture. Cases (n = 425), hospital controls (n = 312) and community controls (n = 454) were drawn from a case-control study of risk factors for hip fracture in women. Study participants were white and black women aged 45 years or older and living in New York City or Philadelphia, Pennsylvania, who were selected between September 1987 and July 1989. Using community controls but not hospital controls, investigators would have concluded that having a fall during the previous 6 months, current smoking, and moving during the previous year were associated with an increased risk of hip fracture. Associations of hip fracture risk with stroke and prior use of ambulatory aids were stronger using community controls, but associations with estrogen use and body mass index were not influenced by choice of control group. Community controls were quite similar to representative samples of community-dwelling elderly women, whereas hospital controls were somewhat sicker and more likely to be current smokers. The authors conclude that community controls comprise the more appropriate control group in case-control studies of hip fracture in the elderly.

Case-control studies: research in reverse. K. F. Schulz, D.A. Grimes. Lancet 2002: 359431-434. Epidemiologists benefit greatly from having case-control study designs in their research armamentarium. Case-control studies can yield important scientific findings with relatively little time, money, and effort compared with other study designs. This seemingly quick road to research results entices many newly trained epidemiologists. Indeed, investigators implement case-control studies more frequently than any other analytical epidemiological study. Unfortunately, case-control designs also tend to be more susceptible to biases than other comparative studies. Although easier to do, they are also easier to do wrong. Five main notions guide investigators who do, or readers who assess, case-control studies. First, investigators must explicitly define the criteria for diagnosis of a case and any eligibility criteria used for selection. Second, controls should come from the same population as the cases, and their selection should be independent of the exposures of interest. Third, investigators should blind the data gatherers to the case or control status of participants or, if impossible, at least blind them to the main hypothesis of the study. Fourth, data gatherers need to be thoroughly trained to elicit exposure in a similar manner from cases and controls; they should use memory aids to facilitate and balance recall between cases and controls. Finally, investigators should address confounding in case-control studies, either in the design stage or with analytical techniques. Devotion of meticulous attention to these points enhances the validity of the results and bolsters the reader's confidence in the findings.

Selection of controls in case-control studies. I. Principles. S. Wacholder, J. K. McLaughlin, D. T. Silverman, J. S. Mandel. Am J Epidemiol 1992: 135(9); p1019-28. A synthesis of classical and recent thinking on the issues involved in selecting controls for case-control studies is presented in this and two companion papers (S. Wacholder et al. Am J Epidemiol 1992; 135:1029-50). In this paper, a theoretical framework for selecting controls in case-control studies is developed. Three principles of comparability are described: 1) study base, that all comparisons be made within the study base; 2) deconfounding, that comparisons of the effects of the levels of exposure on disease risk not be distorted by the effects of other factors; and 3) comparable accuracy, that any errors in measurement of exposure be nondifferential between cases and controls. These principles, if adhered to in a study, can reduce selection, confounding, and information bias, respectively. The principles, however, are constrained by an additional efficiency principle regarding resources and time. Most problems and controversies in control selection reflect trade-offs among these four principles.

Selection of controls in case-control studies. II. Types of controls. S. Wacholder, D. T. Silverman, J. K. McLaughlin, J. S. Mandel. Am J Epidemiol 1992: 135(9); p1029-41. Types of control groups are evaluated using the principles described in paper 1 of the series, "Selection of Controls in Case-Control Studies" (S. Wacholder et al. Am J Epidemiol 1992; 135:1019-28). Advantages and disadvantages of population controls, neighborhood controls, hospital or registry controls, medical practice controls, friend controls, and relative controls are considered. Problems with the use of decreased controls and proxy respondents are discussed.

Selection of controls in case-control studies. III. Design options. S. Wacholder, D. T. Silverman, J. K. McLaughlin, J. S. Mandel. Am J Epidemiol 1992: 135(9); p1042-50. Several design options available in the planning stage of case-control studies are examined. Topics covered include matching, control/case ratio, choice of nested case-control or case-cohort design, two-stage sampling, and other methods that can be used for control selection. The effect of potential problems in obtaining comparable accuracy of exposure is also examined. A discussion of the difficulty in meeting the principles of study base, deconfounding, and comparable accuracy (S. Wacholder et al. Am J Epidemiol 1992; 135:1019-28) in a single study completes this series of papers.

Design issues in case-control studies. S. Wacholder. Stat Methods Med Res 1995: 4(4); p293-309. The most difficult and most important considerations in planning the protocol of a case-control study are ascertainment of cases, selection of controls and the quality of the exposure measurement. Plans to ensure careful field work are equally important; without attention to data collection, the protocol will be meaningless. In most case-control studies, the measurement problem is magnified because one cannot implement the collection of exposure information at the beginning of follow-up, and instead must rely on interviews, existing records or extrapolation into the past. Consideration of a case-control study as an efficient way to study a cohort helps to resolve some design issues.

Are risk factors for sudden infant death syndrome different at night? S. M. Williams, E. A. Mitchell, B. J. Taylor. Arch Dis Child 2002: 87(4); 274-8. AIMS: To determine whether the risk factors for SIDS occurring at night were different from those occurring during the day. METHODS: Large, nationwide case-control study, with data for 369 cases and 1558 controls in New Zealand. RESULTS: Two thirds of SIDS deaths occurred at night (between 10 pm and 7 30 am). The odds ratio (95% CI) for prone sleep position was 3.86 (2.67 to 5.59) for deaths occurring at night and 7.25 (4.52 to 11.63) for deaths occurring during the day; the difference was significant. The odds ratio for maternal smoking for deaths occurring at night was 2.28 (1.52 to 3.42) and that for the day 1.27 (0.79 to 2.03); that for the mother being single was 2.69 (1.29 to 3.99) for a night time death and 1.25 (0.76 to 2.04) for a daytime death. Both interactions were significant. The interactions between time of death and bed sharing, not sleeping in a cot or bassinet, Maori ethnicity, late timing of antenatal care, binge drinking, cannabis use, and illness in the baby were also significant, or almost so. All were more strongly associated with SIDS occurring at night. CONCLUSIONS: Prone sleep position was more strongly associated with SIDS occurring during the day, whereas night time deaths were more strongly associated with maternal smoking and measures of social deprivation.

evidence >> apples >> cluster (1)

Extending the CONSORT statement to cluster randomized trials: for discussion. D. R. Elbourne, M. K. Campbell. Stat Med 2001: 20(3); 489-96. The need for clear reporting of randomized controlled trials has been emphasized recently. The CONSORT Statement has made evidence-based suggestions for a checklist and a patient flow diagram. Adapting this for cluster randomized controlled trials presents particular challenges. Simple changes in the checklist and diagram for the completely randomized two level cluster randomized trials are suggested for discussion. An example taken from an unpublished trial demonstrates that these changes are less simple to implement, although extensions to electronic publications may be helpful. These suggestions should be formally evaluated. Further work is required to consider the cases of more levels and of stratified or pair-matched cluster randomized trials.

evidence >> apples >> cohort (1)

Cigarette smoking and diabetes mellitus: evidence of a positive association from a large prospective cohort study. J. C. Will, D. A. Galuska, E. S. Ford, A. Mokdad, E. E. Calle. Int J Epidemiol 2001: 30(3); p540-6. OBJECTIVE: Only a few prospective studies have examined the relationship between the frequency of cigarette smoking and the incidence of diabetes mellitus. The purpose of this study was to determine whether greater frequency of cigarette smoking accelerated the development of diabetes mellitus, and whether quitting reversed the effect. METHODS: Data were collected in the Cancer Prevention Study I, a prospective cohort study conducted from 1959 through 1972 by the American Cancer Society where volunteers recruited more than one million acquaintances in 25 US states. From these over one million original participants, 275,190 men and 434,637 women aged > or = 30 years were selected for the primary analysis using predetermined criteria. RESULTS: As smoking increased, the rate of diabetes increased for both men and women. Among those who smoked > or = 2 packs per day at baseline, men had a 45% higher diabetes rate than men who had never smoked; the comparable increase for women was 74%. Quitting smoking reduced the rate of diabetes to that of non-smokers after 5 years in women and after 10 years in men. CONCLUSIONS: A dose-response relationship seems likely between smoking and incidence of diabetes. Smokers who quit may derive substantial benefit from doing so. Confirmation of these observations is needed through additional epidemiological and biological research.

evidence >> apples >> concealed (5)

Bias in treatment assignment in controlled clinical trails. TC Chalmers, P Celano, HS Sacks, H Jr Smith. N Engl J Med 1983: 309(22); 1358-61. ABSTRACT: Controlled clinical trials of the treatment of acute myocardial infarction offer a unique opportunity for the study of the potential influence on outcome of bias in treatment assignment. A group of 145 papers was divided into those in which the randomization process was blinded (57 papers), those in which it may have been unblinded (45 papers), and those in which the controls were selected by a nonrandom process (43 papers). At least one prognostic variable was maldistributed (P less than 0.05) in 14.0 per cent of the blinded-randomization studies, in 26.7 per cent of the unblinded-randomization studies, and in 58.1 per cent of the nonrandomized studies. Differences in case-fatality rates between treatment and control groups (P less than 0.05) were found in 8.8 per cent of the blinded-randomization studies, 24.4 per cent of the unblinded-randomization studies, and 58.1 per cent of the nonrandomized studies. These data emphasize the importance of keeping those who recruit patients for clinical trials from suspecting which treatment will be assigned to the patient under consideration.

Randomised trials, human nature, and reporting guidelines. K. F. Schulz. Lancet 1996: 348(9027); 596-8. Abstract not available.

Empirical evidence of bias dimensions of methodological quality associated with estimates of treatment effects in controlled trials. KF Schulz, I Chalmers, RJ Hayes, DG Altman. JAMA 1995: 273(5); 408-12. ABSTRACT: OBJECTIVE--To determine if inadequate approaches to randomized controlled trial design and execution are associated with evidence of bias in estimating treatment effects. DESIGN--An observational study in which we assessed the methodological quality of 250 controlled trials from 33 meta-analyses and then analyzed, using multiple logistic regression models, the associations between those assessments and estimated treatment effects. DATA SOURCES--Meta-analyses from the Cochrane Pregnancy and Childbirth Database. MAIN OUTCOME MEASURES--The associations between estimates of treatment effects and inadequate allocation concealment, exclusions after randomization, and lack of double-blinding. RESULTS--Compared with trials in which authors reported adequately concealed treatment allocation, trials in which concealment was either inadequate or unclear (did not report or incompletely reported a concealment approach) yielded larger estimates of treatment effects (P < .001). Odds ratios were exaggerated by 41% for inadequately concealed trials and by 30% for unclearly concealed trials (adjusted for other aspects of quality). Trials in which participants had been excluded after randomization did not yield larger estimates of effects, but that lack of association may be due to incomplete reporting. Trials that were not double-blind also yielded larger estimates of effects (P = .01), with odds ratios being exaggerated by 17%. CONCLUSIONS--This study provides empirical evidence that inadequate methodological approaches in controlled trials, particularly those representing poor allocation concealment, are associated with bias. Readers of trial reports should be wary of these pitfalls, and investigators must improve their design, execution, and reporting of trials.

Allocation concealment in randomised trials: defending against deciphering. K. F. Schulz, D.A. Grimes. Lancet 2002: 359614-618. Proper randomisation rests on adequate allocation concealment. An allocation concealment process keeps clinicians and participants unaware of upcoming assignments. Without it, even properly developed random allocation sequences can be subverted. Within this concealment process, the crucial unbiased nature of randomised controlled trials collides with their most vexing implementation problems. Proper allocation concealment frequently frustrates clinical inclinations, which annoys those who do the trials. Randomised controlled trials are anathema to clinicians. Many involved with trials will be tempted to decipher assignments, which subverts randomisation. For some implementing a trial, deciphering the allocation scheme might frequently become too great an intellectual challenge to resist. Whether their motives indicate innocent or pernicious intents, such tampering undermines the validity of a trial. Indeed, inadequate allocation concealment leads to exaggerated estimates of treatment effect, on average, but with scope for bias in either direction. Trial investigators will be crafty in any potential efforts to decipher the allocation sequence, so trial designers must be just as clever in their design efforts to prevent deciphering. Investigators must effectively immunise trials against selection and confounding biases with proper allocation concealment. Furthermore, investigators should report baseline comparisons on important prognostic variables. Hypothesis tests of baseline characteristics, however, are superfluous and could be harmful if they lead investigators to suppress reporting any baseline imbalances.

Generation of allocation sequences in randomised trials: chance not choice. K. F. Schulz, D.A. Grimes. Lancet 2002: 359515-519. The randomised controlled trial sets the gold standard of clinical research. However, randomisation persists as perhaps the least-understood aspect of a trial. Moreover, anything short of proper randomisation courts selection and confounding biases. Researchers should spurn all systematic, non-random methods of allocation. Trial participants should be assigned to comparison groups based on a random process. Simple (unrestricted) randomisation, analogous to repeated fair coin-tossing, is the most basic of sequence generation approaches. Furthermore, no other approach, irrespective of its complexity and sophistication, surpasses simple randomisation for prevention of bias. Investigators should, therefore, use this method more often than they do, and readers should expect and accept disparities in group sizes. Several other complicated restricted randomisation procedures limit the likelihood of undesirable sample size imbalances in the intervention groups. The most frequently used restricted sequence generation procedure is blocked randomisation. If this method is used, investigators should randomly vary the block sizes and use larger block sizes, particularly in an unblinded trial. Other restricted procedures, such as urn randomisation, combine beneficial attributes of simple and restricted randomisation by preserving most of the unpredictability while achieving some balance. The effectiveness of stratified randomisation depends on use of a restricted randomisation approach to balance the allocation sequences for each stratum. Generation of a proper randomisation sequence takes little time and effort but affords big rewards in scientific accuracy and credibility. Investigators should devote appropriate resources to the generation of properly randomised trials and reporting their methods clearly.

evidence >> apples >> ecologic (5)

Modeling treatment effects on binary outcomes with grouped-treatment variables and individual covariates. S. C. Johnston, T. Henneman, C. E. McCulloch, M. van der Laan. Am J Epidemiol 2002: 156(8); 753-60. During evaluation of treatment effects in observational studies, confounding is a constant threat because it is always possible that patients with a better prognosis, not adequately characterized by measured covariates, are chosen for a specific therapy. Ecologic analyses may avoid confounding that would be present in analysis at the individual level because variations in regional or hospital practice may be unrelated to prognosis. The authors used simulated data with an excluded confounder to evaluate the reliability and limitations of the grouped-treatment approach, a method of incorporating an ecologic measure of treatment assignment into an individual-level multivariable model, similar to the instrumental variable approach. Estimates based on the grouped-treatment approach were closer to the true value than those of standard individual-level multivariable analysis in every simulation. Furthermore, confidence intervals based on the grouped-treatment approach achieved approximately their nominal coverage, whereas those based on individual-level analyses did not. The grouped-treatment approach appears to be more reliable than standard individual-level analysis in situations where the grouped-treatment variable is unassociated with the outcome except via the actual treatment assignment and measured covariates.

The Semi-individual Study in Air Pollution Epidemiology: A Valid Design as Compared to Ecologic Studies. Nino Kunzli, Ira B. Tager. Environmental Health Perspectives 1997: 105(10); 1078-1083. ABSTRACT: The assessment of long-term effects of air pollution in humans relies on epidemiologic studies. A widely used design consists of cross-sectional or cohort studies in which ecologic assignment of exposure, based on a fixed-site ambient monitor, is employed. Although health outcome and usually a large number of covariates are measured in individuals, these studies are often called ecological. We will introduce the term semi-individual design for these studies. We review the major properties and limitations with regard to causal inference of truly ecologic studies, in which outcome, exposure, and covariates are available on an aggregate level only. Misclassification problems and issues related to confounding and model specification in truly ecologic studies limit etiologic inference to individuals. In contrast, the semi-individual study shares its methodological and inferential properties with typical individual-level study designs. The major caveat relates to the case where too few study areas, e.g., two or three, are used, which render control of aggregate level confounding impossible. The issue of exposure misclassification is of general concern in epidemiology and not an exclusive problem of the semi-individual design. In a multicenter setting, the semi-individual study is a valuable tool to approach long-term effects of air pollution. Knowledge about the error structure of the ecologically assigned exposure allows consideration of the impact of ecologically assigned exposure on effect estimation. Semi-individual studies, i.e., individual level air pollution studies with ecologic exposure assignment, more readily permit valid inference to individuals and should not be labeled as ecologic studies.

Ecologic studies in epidemiology: concepts, principles, and methods. H. Morgenstern. Annu Rev Public Health 1995: 1661-81. An ecologic study focuses on the comparison of groups, rather than individuals; thus, individual-level data are missing on the joint distribution of variables within groups. Variables in an ecologic analysis may be aggregate measures, environmental measures, or global measures. The purpose of an ecologic analysis may be to make biologic inferences about effects on individual risks or to make ecologic inferences about effects on group rates. Ecologic study designs may be classified on two dimensions: (a) whether the primary group is measured (exploratory vs analytic study); and (b) whether subjects are grouped by place (multiple-group study), by time (time-trend study), or by place and time (mixed study). Despite several practical advantages of ecologic studies, there are many methodologic problems that severely limit causal inference, including ecologic and cross-level bias, problems of confounder control, within-group misclassification, lack of adequate data, temporal ambiguity, collinearity, and migration across groups.

Medicine and the Media: Did Monica really say that? Hugh Tunstall-Pedoe. British Medical Journal 1998: 3171023. Abstract not available yet. [Full text]

Ecological study for reasons for sharp decline in mortality from ischaemic heart disease in Poland since 1991. WA Zatonski, AJ McMichael, JW Powles. British Medical Journal 1998: 316(7137); 1047-1051. ABSTRACT: OBJECTIVE: To investigate the reasons for the decline in deaths attributed to ischaemic heart disease in Poland since 1991 after two decades of rising rates. DESIGN: Recent changes in mortality were measured as percentage deviations in 1994 from rates predicted by extrapolation of sex and age specific death rates for 1980-91 for diseases of the circulatory system and selected other categories. Available data on national and household food availability, alcohol consumption, cigarette smoking, socioeconomic indices, and medical services over time were reviewed. MAIN OUTCOME MEASURES: Age specific and age standardised rates of death attributed to ischaemic heart disease and related causes. RESULTS: The change in trend in mortality attributed to diseases of the circulatory system was similar in men and women and most marked (> 20%) in early middle age. For ages 45 to 64 the decrease was greatest for deaths attributed to ischaemic heart disease and atherosclerosis (around 25%) and less for stroke (< 10%). For most of the potentially explanatory variables considered, there were no corresponding changes in trend. However, between 1986-90 and 1994 there was a marked switch from animal fats (estimated availability down 23%) to vegetable fats (up 48%) and increased imports of fruit. CONCLUSION: Reporting biases are unlikely to have exaggerated the true fall in ischaemic heart disease; neither is it likely to be mainly due to changes in smoking, drinking, stress, or medical care. Changes in type of dietary fat and increased supplies of fresh fruit and vegetables seem to be the best candidates. [Medline] [Abstract] [PDF]

evidence >> apples >> example (8)

Influence of maternal age at delivery and birth order on risk of type 1 diabetes in childhood: prospective population based family study. Bart's-Oxford Family Study Group. P. J. Bingley, I. F. Douek, C. A. Rogers, E. A. Gale. British Medical Journal 2000: 321(7258); 420-4. OBJECTIVES: To examine the influence of parental age at delivery and birth order on subsequent risk of childhood diabetes. DESIGN: Prospective population based family study. SETTING: Area formerly administered by the Oxford Regional Health Authority. Participants: 1375 families in which one child or more had diabetes. Of 3221 offspring, 1431 had diabetes (median age at diagnosis 10.5 years, range 0.4-28.5) and 1790 remained non-diabetic at a median age of 16. 1 years. MAIN OUTCOME MEASURES: Disease free survival and hazard ratios for the development of type 1 diabetes in all offspring, assessed by Cox proportional hazard regression. Results: Maternal age at delivery was strongly related to risk of type 1 diabetes in the offspring; risk increased by 25% (95% confidence interval 17% to 34%) for each five year band of maternal age, so that maternal age at delivery of 45 years or more was associated with a relative risk of 3.11 (2.07 to 4.66) compared with a maternal age of less than 20 years. Paternal age was also associated with a 9% (3% to 16%) increase for each five year increase in paternal age. The relative risk of diabetes, adjusted for parental age at delivery and sex of offspring, decreased with increasing birth order; the overall effect was a 15% risk reduction (10% to 21%) per child born. CONCLUSIONS: A strong association was found between increasing maternal age at delivery and risk of diabetes in the child. Risk was highest in firstborn children and decreased progressively with higher birth order. The fetal environment seems to have a strong influence on risk of type 1 diabetes in the child. The increase in maternal age at delivery in the United Kingdom over the past two decades could partly account for the increase in incidence of childhood diabetes over this period. [Medline] [Abstract] [Full text] [PDF]

Statistical Inquiries into the Efficacy of Prayer. Sir Francis Galton. Fortnightly Review 1872: 12125-135. (This article was originally published in 1872 and is reproduced by the Pictures of Health Web Site.) An eminent authority has recently published a challenge to test the efficacy of prayer by actual experiment. I have been induced, through reading this, to prepare the following memoir for publication, nearly the whole of which I wrote and laid by many years ago, after completing a large collection of data, which I had undertaken for the satisfaction of my own conscience. [Full text] [PDF]

Lack of effect of long-term supplementation with beta carotene on the incidence of malignant neoplasms and cardiovascular disease. C. H. Hennekens, J. E. Buring, J. E. Manson, M. Stampfer, B. Rosner, N. R. Cook, C. Belanger, F. La Motte, J. M. Gaziano, P. M. Ridker, W. Willett, R. Peto. N Engl J Med 1996: 334(18); 1145-9. BACKGROUND. Observational studies suggest that people who consume more fruits and vegetables containing beta carotene have somewhat lower risks of cancer and cardiovascular disease, and earlier basic research suggested plausible mechanisms. Because large randomized trials of long duration were necessary to test this hypothesis directly, we conducted a trial of beta carotene supplementation. METHODS. In a randomized, double-blind, placebo-controlled trial of beta carotene (50 mg on alternate days), we enrolled 22,071 male physicians, 40 to 84 years of age, in the United States; 11 percent were current smokers and 39 percent were former smokers at the beginning of the study in 1982. By December 31, 1995, the scheduled end of the study, fewer than 1 percent had been lost to follow-up, and compliance was 78 percent in the group that received beta carotene. RESULTS. Among 11,036 physicians randomly assigned to receive beta carotene and 11,035 assigned to receive placebo, there were virtually no early or late differences in the overall incidence of malignant neoplasms or cardiovascular disease, or in overall mortality. In the beta carotene group, 1273 men had any malignant neoplasm (except nonmelanoma skin cancer), as compared with 1293 in the placebo group (relative risk, 0.98; 95 percent confidence interval, 0.91 to 1.06). There were also no significant differences in the number of cases of lung cancer (82 in the beta carotene group vs. 88 in the placebo group); the number of deaths from cancer (386 vs. 380), deaths from any cause (979 vs. 968), or deaths from cardiovascular disease (338 vs. 313); the number of men with myocardial infarction (468 vs. 489); the number with stroke (367 vs. 382); or the number with any one of the previous three end points (967 vs. 972). Among current and former smokers, there were also no significant early or late differences in any of these end points. CONCLUSIONS. In this trial among healthy men, 12 years of supplementation with beta carotene produced neither benefit nor harm in terms of the incidence of malignant neoplasms, cardiovascular disease, or death from all causes.

Dietary fat intake and the risk of coronary heart disease in women. F. B. Hu, M. J. Stampfer, J. E. Manson, E. Rimm, G. A. Colditz, B. A. Rosner, C. H. Hennekens, W. C. Willett. N Engl J Med 1997: 337(21); 1491-9. BACKGROUND: The relation between dietary intake of specific types of fat, particularly trans unsaturated fat and the risk of coronary disease remains unclear. We therefore studied this relation in women enrolled in the Nurses' Health Study. METHODS: We prospectively studied 80,082 women who were 34 to 59 years of age and had no known coronary disease, stroke, cancer, hypercholesterolemia, or diabetes in 1980. Information on diet was obtained at base line and updated during follow-up by means of validated questionnaires. During 14 years of follow-up, we documented 939 cases of nonfatal myocardial infarction or death from coronary heart disease. Mutivariate analyses included age, smoking status, total energy intake, dietary cholesterol intake, percentages of energy obtained from protein and specific types of fat, and other risk factors. RESULTS: Each increase of 5 percent of energy intake from saturated fat, as compared with equivalent energy intake from carbohydrates, was associated with a 17 percent increase in the risk of coronary disease (relative risk, 1.17; 95 percent confidence interval, 0.97 to 1.41; P=0.10). As compared with equivalent energy from carbohydrates, the relative risk for a 2 percent increment in energy intake from trans unsaturated fat was 1.93 (95 percent confidence interval, 1.43 to 2.61; P<0.001); that for a 5 percent increment in energy from monounsaturated fat was 0.81 (95 percent confidence interval, 0.65 to 1.00; P=0.05); and that for a 5 percent increment in energy from polyunsaturated fat was 0.62 (95 percent confidence interval, 0.46 to 0.85; P= 0.003). Total fat intake was not signficantly related to the risk of coronary disease (for a 5 percent increase in energy from fat, the relative risk was 1.02; 95 percent confidence interval, 0.97 to 1.07; P=0.55). We estimated that the replacement of 5 percent of energy from saturated fat with energy from unsaturated fats would reduce risk by 42 percent (95 percent confidence interval, 23 to 56; P<0.001) and that the replacement of 2 percent of energy from trans fat with energy from unhydrogenated, unsaturated fats would reduce risk by 53 percent (95 percent confidence interval, 34 to 67; P<.001). CONCLUSIONS: Our findings suggest that replacing saturated and trans unsaturated fats with unhydrogenated monounsaturated and polyunsaturated fats is more effective in preventing coronary heart disease in women than reducing overall fat intake.

Risk factors for lung cancer and for intervention effects in CARET, the Beta-Carotene and Retinol Efficacy Trial. G. S. Omenn, G. E. Goodman, M. D. Thornquist, J. Balmes, M. R. Cullen, A. Glass, J. P. Keogh, F. L. Meyskens, Jr., B. Valanis, J. H. Williams, Jr., S. Barnhart, M. G. Cherniack, C. A. Brodkin, S. Hammar. Journal of the National Cancer Institute 1996: 88(21); 1550-9. BACKGROUND: Evidence has accumulated from observational studies that people eating more fruits and vegetables, which are rich in beta-carotene (a violet to yellow plant pigment that acts as an antioxidant and can be converted to vitamin A by enzymes in the intestinal wall and liver) and retinol (an alcohol chemical form of vitamin A), and people having higher serum beta-carotene concentrations had lower rates of lung cancer. The Beta-Carotene and Retinol Efficacy Trial (CARET) tested the combination of 30 mg beta-carotene and 25,000 IU retinyl palmitate (vitamin A) taken daily against placebo in 18314 men and women at high risk of developing lung cancer. The CARET intervention was stopped 21 months early because of clear evidence of no benefit and substantial evidence of possible harm; there were 28% more lung cancers and 17% more deaths in the active intervention group (active = the daily combination of 30 mg beta-carotene and 25,000 IU retinyl palmitate). Promptly after the January 18, 1996, announcement that the CARET active intervention had been stopped, we published preliminary findings from CARET regarding cancer, heart disease, and total mortality. PURPOSE: We present for the first time results based on the pre-specified analytic method, details about risk factors for lung cancer, and analyses of subgroups and of factors that possibly influence response to the intervention. METHODS: CARET was a randomized, double-blinded, placebo-controlled chemoprevention trial, initiated with a pilot phase and then expanded 10-fold at six study centers. Cigarette smoking history and status and alcohol intake were assessed through participant self-report. Serum was collected from the participants at base line and periodically after randomization and was analyzed for beta-carotene concentration. An Endpoints Review Committee evaluated endpoint reports, including pathologic review of tissue specimens. The primary analysis is a stratified logrank test for intervention arm differences in lung cancer incidence, with weighting linearly to hypothesized full effect at 24 months after randomization. Relative risks (RRs) were estimated by use of Cox regression models; tests were performed for quantitative and qualitative interactions between the intervention and smoking status or alcohol intake. O'Brien-Fleming boundaries were used for stopping criteria at interim analyses. Statistical significance was set at the .05 alpha value, and all P values were derived from two-sided statistical tests. RESULTS: According to CARET's pre-specified analysis, there was an RR of 1.36 (95% confidence interval [CI] = 1.07-1.73; P = .01) for weighted lung cancer incidence for the active intervention group compared with the placebo group, and RR = 1.59 (95% CI = 1.13-2.23; P = .01) for weighted lung cancer mortality. All subgroups, except former smokers, had a point estimate of RR of 1.10 or greater for lung cancer. There are suggestions of associations of the excess lung cancer incidence with the highest quartile of alcohol intake (RR = 1.99; 95% CI = 1.28-3.09; test for heterogeneity of RR among quartiles of alcohol intake has P = .01, unadjusted for multiple comparisons) and with large-cell histology (RR = 1.89; 95% CI = 1.09-3.26; test for heterogeneity among histologic categories has P = .35), but not with base-line serum beta-carotene concentrations. CONCLUSIONS: CARET participants receiving the combination of beta-carotene and vitamin A had no chemopreventive benefit and had excess lung cancer incidence and mortality. The results are highly consistent with those found for beta-carotene in the Alpha-Tocopherol Beta-Carotene Cancer Prevention Study in 29133 male smokers in Finland.

Observational Studies. PR Rosenbaum (1995) New York: Springer-Verlag.

The effect of vitamin E and beta carotene on the incidence of lung cancer and other cancers in male smokers. Beta Carotene Cancer Prevention Study Group The Alpha-Tocopherol. NEJM 1994: 330(15); 1029-35. ABSTRACT: BACKGROUND. Epidemiologic evidence indicates that diets high in carotenoid-rich fruits and vegetables, as well as high serum levels of vitamin E (alpha-tocopherol) and beta carotene, are associated with a reduced risk of lung cancer. METHODS. We performed a randomized, double-blind, placebo-controlled primary-prevention trial to determine whether daily supplementation with alpha-tocopherol, beta carotene, or both would reduce the incidence of lung cancer and other cancers. A total of 29,133 male smokers 50 to 69 years of age from southwestern Finland were randomly assigned to one of four regimens: alpha-tocopherol (50 mg per day) alone, beta carotene (20 mg per day) alone, both alpha-tocopherol and beta carotene, or placebo. Follow-up continued for five to eight years. RESULTS. Among the 876 new cases of lung cancer diagnosed during the trial, no reduction in incidence was observed among the men who received alpha-tocopherol (change in incidence as compared with those who did not, -2 percent; 95 percent confidence interval, -14 to 12 percent). Unexpectedly, we observed a higher incidence of lung cancer among the men who received beta carotene than among those who did not (change in incidence, 18 percent; 95 percent confidence interval, 3 to 36 percent). We found no evidence of an interaction between alpha-tocopherol and beta carotene with respect to the incidence of lung cancer. Fewer cases of prostate cancer were diagnosed among those who received alpha-tocopherol than among those who did not. Beta carotene had little or no effect on the incidence of cancer other than lung cancer. Alpha-tocopherol had no apparent effect on total mortality, although more deaths from hemorrhagic stroke were observed among the men who received this supplement than among those who did not. Total mortality was 8 percent higher (95 percent confidence interval, 1 to 16 percent) among the participants who received beta carotene than among those who did not, primarily because there were more deaths from lung cancer and ischemic heart disease. CONCLUSIONS. We found no reduction in the incidence of lung cancer among male smokers after five to eight years of dietary supplementation with alpha-tocopherol or beta carotene. In fact, this trial raises the possibility that these supplements may actually have harmful as well as beneficial effects.

Comparison of maternal and infant outcomes between vacuum extraction and forceps deliveries. S. W. Wen, S. Liu, M. S. Kramer, S. Marcoux, A. Ohlsson, R. Sauve, R. Liston. Am J Epidemiol 2001: 153(2); 103-7. The authors conducted a population-based historical cohort study in the Canadian province of Quebec to assess the maternal and infant outcomes associated with vacuum extraction and forceps deliveries. The study database contains information on 305,391 mother-infant dyads (linked by a common institutional code and hospital chart number) for singleton live vaginal births with a nonbreech presentation at the gestational age of 37 or more completed weeks and a birth weight between 2,500 and 4,000 g during fiscal years 1991/1992 to 1995/1996. Of the births, 31,015 were delivered by vacuum extraction, and 18,727 were delivered by forceps. Compared with delivery by forceps, the adjusted risk ratios for third-/fourth-degree perineal laceration, intracranial hemorrhage, subdural or cerebral hemorrhage, intraventricular hemorrhage, subarachnoid hemorrhage, cephalhematoma, and neonatal in-hospital death were 0.48 (95% confidence interval: 0.45, 0.50), 1.28 (95% confidence interval: 0.73, 2.25), 0.97 (95% confidence interval: 0.49, 1.93), 0.99 (95% confidence interval: 0.16, 5.97), 5.44 (confidence interval: 1.26, 23.43), 2.02 (95% confidence interval: 1.89, 2.16), and 0.93 (95% confidence interval: 0.32, 2.70), respectively. The authors conclude that vacuum extraction causes less maternal trauma but may increase the risk of cephalhematoma and certain types of intracranial hemorrhage (e.g., subarachnoid hemorrhage).

evidence >> apples >> historical (3)

A Challenge for HD Researchers. Ken Pidock, Huntington's Disease Advocacy Center. Accessed on 2003-06-20. "To those of us who have watched Huntington's Disease for more than a generation, news about actual clinical trials of potential therapies is most welcome. However, such news also carries issues concerning how such therapies can best be evaluated." www.hdac.org/features/article.php?p_articleNumber=32

The way forward for clinical research. Sir Michael Rawlins, Pharmafocus. Accessed on 2003-06-20. "Historical controls can be very useful, particularly where one is investigating otherwise untreatable conditions where there is a biologically plausible basis for the treatment, and where the outcome untreated is homogenous and either very disabling or fatal." Published June 2, 2003. www.pharmafile.com/Pharmafocus/Features/feature.asp?fID=354

Randomized versus Historical Controls for Clinical Trials. H Sacks, TC Chalmers, H Jr Smith. The American Journal of Medicine 1982: 72(2); 233-240. ABSTRACT: To compare the use of randomized controls (RCTs) and historical controls (HCTs) for clinical trials, we searched the literature for therapies studied by both methods. We found six therapies for which 50 RCTs and 56 HCTs were reported. Forty-four of 56 HCTs (79 percent) found the therapy better than the control regimen, but only 10 of 50 RCTs (20 percent) agreed. For each therapy, the treated patients in RCTs and HCTs of the same therapy was largely due to differences in outcome for the control groups, with HCT control patients generally doing worse than the RCT control groups. Adjustment of the outcomes of the HCTs for prognostic factors, when possible, did not appreciably change the results. The data suggest that biases in patient selection may irretrievably weight the outcome of HCts in favor of new therapies. RCTs may miss clinically important benefits because of inadequate attention to sample size. The predictive value of each might be improved by reconsidering the use of p less than 0.05 as the significance level for all types of clinical trials, and by the use of confidence intervals around estimates of treatment effects.

evidence >> apples >> matching (4)

Hypothesis: Comparisons of inter- and intra-individual variations can substitute for twin studies in drug research. W. Kalow, B. K. Tang, L Endrenyi. Pharmacogenetics 1998: 8(4); 283-289. ABSTRACT: Twin studies are useful devices to determine the heritability of persistent but variable characteristics that tend to differ among individuals. Drug responses are not persistent affairs; they are temporary characteristics. One therefore may ask whether twin studies are necessary to assess the genetic element in pharmacological responsiveness. To measure the genetic component contributing to their variability, it seems logical to investigate the response variation by repeated drug administration to given individuals, and to compare the variability of the responses within and between individuals. We attempt here to describe a theoretical background of this venture, and to show some results of the exercise. Potential sources of error or uncertainty are discussed.

Removal of radiation dose response effects: an example of over-matching. J. L. Marsh, J. L. Hutton, K. Binks. Bmj 2002: 325(7359); 327-30. [Medline] [Full text] [PDF]

Paired versus Two-Sample Design for a Clinical Trial of Treatments with Dichotomous Outcome: Power Considerations. S Wacholder, CR Weinberg. Biometrics 1982: 38(3); 801-812. ABSTRACT: For the same number of observations in a small-sample clinical trial with dichotomous outcome, the statistical power associated with a two-sample design, analyzed by Fisher's exact test, is slightly greater than that associated with a matched design, analyzed by McNemar's test, and hence of the matched design, is monotone increasing in the within-pair correlation between the treatment responses. Power curves are presented which demonstrate that positive within-pair correlation, even when quite small, can result in a superiority in power for the matched design. Conversely, in the rare situations where there is a negative within-pair correlation, choice of a two-sample design can result in a substantial gain in power.

Matching in epidemiology as a paradigm for twin research on the Etiology of Disease. C White. Acta Geneticae Medicae Et Gemellologiae 1981: 30(1); 77-86. Abstract not available.

evidence >> apples >> observational (14)

A comparison of observational studies and randomized, controlled trials. K. Benson, A. J. Hartz. New England Journal of Medicine 2000: 342(25); 1878-86. BACKGROUND: For many years it has been claimed that observational studies find stronger treatment effects than randomized, controlled trials. We compared the results of observational studies with those of randomized, controlled trials. METHODS: We searched the Abridged Index Medicus and Cochrane data bases to identify observational studies reported between 1985 and 1998 that compared two or more treatments or interventions for the same condition. We then searched the Medline and Cochrane data bases to identify all the randomized, controlled trials and observational studies comparing the same treatments for these conditions. For each treatment, the magnitudes of the effects in the various observational studies were combined by the Mantel-Haenszel or weighted analysis-of-variance procedure and then compared with the combined magnitude of the effects in the randomized, controlled trials that evaluated the same treatment. RESULTS: There were 136 reports about 19 diverse treatments, such as calcium-channel-blocker therapy for coronary artery disease, appendectomy, and interventions for subfertility. In most cases, the estimates of the treatment effects from observational studies and randomized, controlled trials were similar. In only 2 of the 19 analyses of treatment effects did the combined magnitude of the effect in observational studies lie outside the 95 percent confidence interval for the combined magnitude in the randomized, controlled trials. CONCLUSIONS: We found little evidence that estimates of treatment effects in observational studies reported after 1984 are either consistently larger than or qualitatively different from those obtained in randomized, controlled trials.

Invited commentary: Rare side effects of obstetric interventions: Are observational studies good enough? P. Buekens. Am J Epidemiology 2001: 153(2); 108-9.

Systematic reviews and lifelong diseases. H. E. Elphick, A. Tan, D. Ashby, R. L. Smyth. Bmj 2002: 325(7360); 381-4. Systematic reviews of randomised controlled trials provide an evidence base for treatment but too often fail to give adequate information on long term outcomes. Elphick and colleagues discuss the limitations of the systematic review of randomised controlled trials for patients with chronic or lifelong diseases and suggest that long term observational studies have a place in the evaluation of the benefits and risks of treatment. [Full text] [PDF]

Statistics in Action. M.H. Gail. Journal of the American Statistical Association 1996: 91(433); 1-13. Abstract not available.

Research Fables from the Sisters Grinn, No. 1. The Hunch-test of Notre Dame.. Jeanne Grace, University of Rochester School of Nursing. Accessed on 2003-05-27. "Once upon a time in the land of Evidence, a sickly baby was born. His parents loved him and nursed him back to health and named him Quasi-experiment. As he grew, Quasi-experiment was unable to keep up with the other children. His physical challenges made him unable to compete in games of Manipulate the Independent Variable, and his strength was insufficient for random assignment tasks. While his schoolmates Randomized Clinical Trial and True Experiment received glowing praise for their accomplishments, Quasi- experiment received only disdain. The land of Evidence valued rigorous tests of causality above all else and had no tolerance for other investigative approaches. Saddened and isolated, Quasi-experiment withdrew from the company of others and came to live in the remote towers of the great cathedral of Evidence, Notre Dame." http://www.urmc.rochester.edu/SON/Fables/hunchbck.htm

How Good Is the Evidence Linking Breastfeeding and Intelligence? Anjali Jain, John Concat, John M. Leventhal. Pediatrics Journals 2002 (April): 109(6); 1044-1053. Section of General Pediatrics, Department of Pediatrics, University of Chicago Children’s Hospital, Chicago, Illinois Robert Wood Johnson Clinical Scholars Program, Yale University, New Haven, Connecticut Section of General Pediatrics, Department of Pediatrics, Yale University, New Haven, Connecticut | Department of Medicine, Yale University, New HavenConnecticut Clinical Epidemiology Unit, West Haven Veterans Affairs Medical Center, West Haven, Connecticut Background. We conducted a critical review of the many studies that have tried to determine whether breastfeeding has a beneficial effect on intellect. Design/Methods. By searching Medline and the references of selected articles, we identified publications that evaluated the association between breastfeeding and cognitive outcomes. We then appraised and described each study according to 8 principles of clinical epidemiology: 1) study design, 2) target population: whether full-term infants were studied, 3) sample size, 4) collection of feeding data: whether studies met 4 standards of quality— suitable definition and duration of breastfeeding, and appropriate timing and source of feeding data, 5) control of susceptibility bias: whether studies controlled for socioeconomic status and stimulation of the child, 6) blinding: whether observers of the outcome were blind to feeding status, 7) outcome: whether a standardized individual test of general intelligence at an age older than 2 years was used, and 8) format of results: whether studies reported an effect size or some other strategy to interpret the clinical impact of results. Results. We identified 40 pertinent publications from 1929 to February 2001. Twenty-seven (68%) concluded that breastfeeding promotes intelligence. Many studies, however, had methodological flaws. Only 2 papers studied full-term infants and met all 4 standards of high-quality feeding data, controlled for 2 critical confounders, reported blinding, used an appropriate test, and allowed the reader to interpret the clinical significance of the findings with an effect size. Of these 2, 1 study concluded that the effect of breastfeeding on intellect was significant, and the other did not. Conclusion. Although the majority of studies concluded that breastfeeding promotes intelligence, the evidence from higher quality studies is less persuasive.

Problems and approaches in investigating the role of micronutrients in the aetiology of cancer in humans. J. Little. Br Med Bull 1999: 55(3); 600-18. Observational studies have provided leads regarding a number of micronutrients which may account for the apparent protective effects of high intakes of vegetables and fruit against many types of cancer. In general, these leads have not been confirmed by randomised controlled trials. This apparent conflict raises issues about the timing and duration of a critical period or periods during which micronutrient intake may influence the development of cancer, the dose, possible interaction between high doses of micronutrients and exposures conferring a high risk of cancer and gene-micronutrient interactions. When gene-environmental interaction exists, failure to take both of these sets of factors into account leads to bias in the estimation of disease risk. As a result of recent advances, it is now possible to take measures of genetic susceptibility into account. Therefore, in future studies, the opportunity should be taken to obtain DNA samples to determine genotypes for polymorphisms potentially affecting micronutrient metabolism.

Interpreting the evidence: choosing between randomised and non-randomised studies. M McKee, A Britton, N Black, K McPherson, C Sanderson, C Bain. British Medical Journal 1999: 319(7205); 312-15. Abstract not available. [Medline] [Full text] [PDF]

The arrogance of preventive medicine. D. L. Sackett. Cmaj 2002: 167(4); 363-4.

Humility in observational studies. J. D. Shelton. Science 2002: 297(5590); 2208. Abstract not available yet.

Fat chance: diet and ischemic stroke [editorial; comment]. R. Sherwin, T. R. Price. Jama 1997: 278(24); 2185-6. Abstract not available.

Smoking as "independent" risk factor for suicide: illustration of an artifact from observational epidemiology? G. D. Smith, A. N. Phillips, J. D. Neaton. Lancet 1992: 340(8821); 709-12. Two widely used criteria for determining whether an association between a risk factor and a disease is causal are dose response and independence from other factors. Data from a large US risk factor study (MRFIT) throw up a relation between cigarette smoking and suicide that meets these criteria, yet appears to be biologically implausible. It is likely that many more such associations, for other exposures and other diseases, are equally spurious, but are protected by their lack of obvious implausibility.

Epidemiology faces its limits. G. Taubes. Science 1995: 269(5221); p164-9. Abstract not available.

The Cochrane Lecture. The best and the enemy of the good: randomised controlled trials, uncertainty, and assessing the role of patient choice in medical decision making. K. McPherson. J. Epidemiol. Community Health 1994: 48(1); 6-15. This lecture aimed to create a bridge to span the conceptual and ideological gap between randomised controlled trials and systematic observational comparisons and to reduce unwanted and unproductive polarisation. The argument, simply put, is that since randomisation alone eliminates the selection effect of therapeutic decision making, anything short of randomisation to attribute cause to consequent outcome is a waste of time. If observational comparison does have any significant part in evaluating medical outcomes, there is a grave danger of "the best", to paraphrase Voltaire, becoming "the enemy of the good". The first section aims to emphasise the advantages of randomised controlled trials. Then the nature of an essential precondition--medical uncertainty--is discussed in terms of its extent and effect. Next, the role of patient choice in medical decision making is considered, both when outcomes can safely be attributed to treatment choice and when they cannot. There may be many important situations in which choice itself affects outcome and this could mean that random comparisons give biased estimates of true therapeutic effects. In the penultimate section, the implications of this possibility both for randomised controlled trials and for outcome research is pursued and lastly there are some simple recommendations for reliable outcome research. [Medline]

evidence >> apples >> overview (2)

What is a P-value?. Ronald Thisted. Accessed on 2003-06-20. "Results favoring one treatment over another in a randomized clinical trial can be explained only if the favored treatment really is superior or the apparent advantage enjoyed by the treatment is due solely to the working of chance." www.stat.uchicago.edu/~thisted/Distribute/pvalue.pdf

Study designs in medical research. Ronald Thisted. Accessed on 2003-06-20. "Study design is the procedure under which a study is carried out." galton.uchicago.edu/~thisted/courses/315/lectures/0297.pdf

evidence >> apples >> randomization (46)

The mythology of randomization. U. Abel, A. Koch. Accessed on 2003-06-30. "In biostatistics and medicine one sometimes encounters an extremely negative view or even a categorical rejection of nonrandomized studies. This attitude may be comprehensible from a historical, pragmatic, or educational viewpoint but it is not well-founded on epistemological grounds. In addition, it is potentially harmful." Published in the Proceedings of the International Conference on Nonrandomized Comparative Clinical Studies in Heidelberg, April 10 -11,1997. www.symposion.com/nrccs/abel.htm

Coronary artery surgery study (CASS): a randomized trial of coronary artery bypass surgery. Comparability of entry characteristics and survival in randomized patients and nonrandomized patients meeting randomization criteria. CASS Principal Investigators and Their Associates. Journal of the American College of Cardiology 1984: 3(1); 114-28. The Coronary Artery Surgery Study (CASS) includes a randomized trial of coronary artery bypass surgery and medical therapy in the management of patients with mild or moderate stable angina pectoris or free of angina but with a documented history of myocardial infarction. While 780 patients at 11 participating institutions entered the randomized trial, 1,315 patients at the same institutions met randomization criteria but declined participation in the randomized study; they constitute the "randomizable" patients. Half the randomized patients were assigned to surgery and half to the medical group. Of the 1,315 randomizable patients, 43% started with surgical therapy and 57% constitute the medical group. Follow-up periods average 64 months (range 46 to 92). The only entry characteristic in which the randomized and randomizable medical groups differ importantly is the extent of coronary artery disease, which is less extensive in the latter. The two surgical groups also differ in this respect, but with more extensive disease in the randomizable group. At 5 year follow-up, 24% of the medically-assigned randomized patients and 22% of the medically-started randomizable patients have had coronary bypass surgery. Survival in the medically-randomized and randomizable patient groups is similar in the aggregate (both 92% at 5 years) and also in all subgroups based on clinical classification, the number of diseased vessels, the presence of proximal left anterior descending coronary artery disease and ejection fraction. Survival for the surgically-assigned randomized patients and the surgically-started randomizable patients is also similar in the aggregate (95 and 94%, respectively) and in all subgroups. It is concluded that the randomized patients in CASS are not a special or atypical subset of those eligible for randomization. The data from the randomizable patients thus support and extend the inference of the generally very good survival of both the medically- and surgically-assigned patients of the randomized trial. [Medline]

The Paired Availability Design: An Update. S. G. Baker. Accessed on 2003-06-30. "Baker and Lindeman [ 3] introduced the paired availability design for strengthening inference when using historical controls. We review the design in the context of the following updates. First, we make the notation similar to that in the recent literature on all-or-none compliance in randomized trials. See the review in Baker [ 2] and Angrist et al. [ 1] . Second, in addition to excess risk, we consider the relative risk as a possible test statistic. Cuzick et al [ 4] independently made similar calculations in the context of a randomized trial with all-or-none compliance. Third, we recommend using the inverse of the variance rather than the inverse of the standard error when weighting estimates from multiple pairs. This was also independently suggested by Cuzick et al. [ 4] in the context of randomized trials. Fourth, to improve the sample size calculation we suggest a method for using exogenous data to estimate the variation due to random time changes. Fifth, we propose an adjustment for one type of systematic change over time." Published in the Proceedings of the International Conference on Nonrandomized Comparative Clinical Studies in Heidelberg, April 10 -11,1997. www.symposion.com/nrccs/baker.htm

Unconventional therapies and cancer. M. Begin, E. Kaegi. Cmaj 1999: 161(6); 686-7. Abstract not available yet.

Comparing like with like: some historical milestones in the evolution of methods to create unbiased comparison groups in therapeutic experiments. I. Chalmers. Int J Epidemiol 2001: 30(5); 1156-64. Histories of clinical trials have recorded and analysed the development of quantification in therapeutic evaluation, the emergence of probabilistic thinking, the application of statistical methods and theory, and the sociology, ethics and politics of clinical trials; but it is surprising that they only rarely identify as a distinct theme the development of efforts to control biases. An exception is Kaptchuk's recent account of the history of blinding and placebos for reducing observer biases. In this complementary paper I introduce and discuss some milestones between 1662 and 1948 in the development of methods to control selection biases when assembling therapeutic comparison groups, to ensure, as far as possible, that 'like is compared with like'. In the paper I note (i) that treatment allocation based on strict alternation abolishes selection bias as effectively as treatment allocation based on strict random allocation; (ii) that use of schedules based on random numbers is more likely to prevent foreknowledge of allocation schedules, and thus the risk of introducing selection bias at the point of recruitment to trials; (iii) that a concern to conceal allocation schedules was the rationale for using schedules based on random numbers in the Medical Research Council trials of vaccination for whooping cough and streptomycin for pulmonary tuberculosis; and (iv) that the introduction of allocation concealment more than half a century ago remains the most recent substantive milestone in the history of efforts to control selection biases in therapeutic experiments.

Experimental Study versus Non-Experimental Study: The Non-Experimental (Non-Randomized) Study as a Methodological Compromise. K. Dannehl. Accessed on 2003-06-30. "Most methodologists agree that the experimental study is not only the best method for physics, chemistry, and biology, but also for medical research. However, often one has to be satisfied with non-experimental, i.e., less than optimal, designs for gaining knowledge. This is due to organisational and economic, as well as legal and ethical limits which we often meet when we conduct experiments in humans and which we can not, may not, or do not want to go beyond." Published in the Proceedings of the International Conference on Nonrandomized Comparative Clinical Studies in Heidelberg, April 10 -11,1997. Translated by Christoph Trautner. www.symposion.com/nrccs/dannehl.htm

evidence-based conclusions on the efficacy of a treatment - What can be learned from risk assessment?. L. Edler. Accessed on 2003-06-30. "A comparative clinical trial for which the randomization of the patients is either impossible, unwelcome, or inopportune, misses a basic justification for the establishment a causal relationship between the treatment and the health outcome. Various designs of observational studies have been developed with the aim of identifying and defining treatments which may have a curative or palliative effect for patients despite the absence of this methodological requirement. The discussions of pros and cons of these approaches make obvious the need of new methodologies for clinical studies when treatments and effects are to be related in a non-randomized set-up. In this situation, it may be helpful to adopt an approach similar to that of toxicology where the exposure to hazardous substances is related to the possible noxious effects on human health. Usually randomized studies are unavailable for risk assessments so that toxicological epidemiology has to base its conclusions on best available evidence. In this contribution analogies, resemblences, and dissemblances between risk assessment and treatment evaluation using nonrandomized studies are shown, and, on the basis of partial concordance, a proposal for the achievement of evidence-based Therapy Assessment (EBTA) is derived for a causal relationship between treatment and its effects on the human disease relief. EBTA may be helpful for structuring, ordering and weighting medical evidence when consensus on treatment recommendations has to be found in the face of results from randomized as well as non-randomized studies, and other data." Published in the Proceedings of the International Conference on Nonrandomized Comparative Clinical Studies in Heidelberg, April 10 -11,1997. www.symposion.com/nrccs/ledler.htm

Problems of Randomized Trials. A.R. Feinstein. Accessed on 2003-06-30. "Regardless of how wonderful randomized trials are - and I will yield to no one in acknowledging them as the gold-standard when they can be done - they have some major problems and difficulties (Table 1)." Published in the Proceedings of the International Conference on Nonrandomized Comparative Clinical Studies in Heidelberg, April 10 -11,1997. www.symposion.com/nrccs/feinstein.htm

A Nonparametric Test for Evaluating Coherent Alternativesin Nonrandomised Studies. O. Gefeller, L. Pralle. Accessed on 2003-06-30. "When considering the effect of treatments or exposures on some outcome variable in a nonrandomised study, the presence of coherence provides supporting evidence that an observed relationship between the factors of interest might reflect a causal treatment or exposure effect. In our understanding, coherence means that we have a specific and detailed description of what an actual treatment or exposure effect would look like. The concept of coherence can then be used to formulate a "coherent pattern" of expected results, indicative of a real effect of the treatment or exposure under study, that can be tested using the observed data. In the paper, we review a simple nonparametric rank test, developed by Rosenbaum, for testing the null hypothesis of no treatment/exposure effect against arbitrarily complicated coherent alternatives. In addition, we introduce a new measure of coherence to summarise quantitatively the coherence present in the data. Two empirical examples, one epidemiological investigation and one nonrandomised clinical trial, illustrate the application of the methodology." Published in the Proceedings of the International Conference on Nonrandomized Comparative Clinical Studies in Heidelberg, April 10 -11,1997. www.symposion.com/nrccs/gefeller.htm

Randomized Controlled Trials: Evidence Biased Psychiatry. David Healy, Alliance for Human Research Protection. Accessed on 2002-"A new drug gets introduced to the market. It has been approved after stringent scrutiny by the FDA, which requires ever more convincing evidence that it works and that its safe. The new treatment will always cost more than the old treatments, but even on the cost front, many would argue that we have entered an era where placebo controlled clinical trials demonstrate that new in contrast to older treatments actually do work, and if we just stick to treatments that really work costs should fall. Besides it always seems to happen these days that when new and costly antidepressants or antipsychotics are put through an economic model based on the figures from clinical trials and a range of assumptions provided by experts, the model demonstrates that these new drugs costing thousand of dollars a year are in fact cheaper than treatments costing $100 per year or less. So where could the problems lie? Why do we seem to be so slow in reaching the new medical utopia towards which companies and others assure us we are heading?" www.researchprotection.org/COI/healy0802.html

The Analysis of Intervention Effects Using Observational Data Bases. C. Heuer, U. Abel. Accessed on 2003-06-30. "If, in a clinical unit, a new treatment is introduced within a short time period the problem arises as to how to evaluate its immediate impact on the patients’ prognosis, i.e., the (possible) intervention effect. An exploratory tool is described which can be employed to examine this effect. The method is illustrated by means of a clinical example." Published in the Proceedings of the International Conference on Nonrandomized Comparative Clinical Studies in Heidelberg, April 10 -11,1997. www.symposion.com/nrccs/heuer.htm

Proof versus plausibility: rules of engagement for the struggle to evaluate alternative cancer therapies. L. J. Hoffer. Cmaj 2001: 164(3); 351-3.

Methodological Contributions to Clinical Research: Random Sampling, Randomization, and Equivalence of Contrasted Groups in Psychotherapy Outcome Research. Louis M Hsu. Journal of Consulting and Clinical Psychology 1989: 57(1); 131-137.

Comparison of evidence of treatment effects in randomized and nonrandomized studies. J. P. Ioannidis, A. B. Haidich, M. Pappa, N. Pantazis, S. I. Kokori, M. G. Tektonidou, D. G. Contopoulos-Ioannidis, J. Lau. Jama 2001: 286(7); p821-30. CONTEXT: There is substantial debate about whether the results of nonrandomized studies are consistent with the results of randomized controlled trials on the same topic. OBJECTIVES: To compare results of randomized and nonrandomized studies that evaluated medical interventions and to examine characteristics that may explain discrepancies between randomized and nonrandomized studies. DATA SOURCES: MEDLINE (1966-March 2000), the Cochrane Library (Issue 3, 2000), and major journals were searched. STUDY SELECTION: Forty-five diverse topics were identified for which both randomized trials (n = 240) and nonrandomized studies (n = 168) had been performed and had been considered in meta-analyses of binary outcomes. DATA EXTRACTION: Data on events per patient in each study arm and design and characteristics of each study considered in each meta-analysis were extracted and synthesized separately for randomized and nonrandomized studies. DATA SYNTHESIS: Very good correlation was observed between the summary odds ratios of randomized and nonrandomized studies (r = 0.75; P<.001); however, nonrandomized studies tended to show larger treatment effects (28 vs 11; P =.009). Between-study heterogeneity was frequent among randomized trials alone (23%) and very frequent among nonrandomized studies alone (41%). The summary results of the 2 types of designs differed beyond chance in 7 cases (16%). Discrepancies beyond chance were less common when only prospective studies were considered (8%). Occasional differences in sample size and timing of publication were also noted between discrepant randomized and nonrandomized studies. In 28 cases (62%), the natural logarithm of the odds ratio differed by at least 50%, and in 15 cases (33%), the odds ratio varied at least 2-fold between nonrandomized studies and randomized trials. CONCLUSIONS: Despite good correlation between randomized trials and nonrandomized studies-in particular, prospective studies-discrepancies beyond chance do occur and differences in estimated magnitude of treatment effect are very common.

Amniotomy or oxytocin for induction of labor. Re-analysis of a randomized controlled trial. M. J. Keirse. Acta Obstet Gynecol Scand 1988: 67(8); p731-5. A recently reported "prospective, randomized study into amniotomy and oxytocin as induction methods in a total unselected population" was examined for selection bias and bias after entry into the study. The null hypothesis that clinical attitudes to amniotomy as a means for inducing labor had no influence on the decision to enter women into the trial and allocate them to either amniotomy or oxytocin was rejected at p less than 0.00025. Clinical attitudes were further found to statistically significantly influence the prescribed assessments 4 h after entry into the trial and the selection of the second intervention that was required in the absence of acceptable progress (p less than 0.0005). Bias at the time of this prescribed assessment was large enough to result in an inverse relationship between "acceptable progress within 4 hours" and "delivery within 24 hours" after induction. A subanalysis of the nulliparae entered into the trial further substantiated both bias at entry and bias in following the prescribed protocol. As hypothesized, these biases reached a greater statistical significance in nulliparous than in parous women. The likelihood that all of these observations would be encountered in a truly randomized study of this size can be estimated to be less than one in a billion (or p less than 0.000,000,000,000,1). The study, therefore, provides a classical example of the dangers of non-blind allocation to different treatment groups in clinical trials. It is further concluded that no randomized controlled studies between amniotomy and oxytocin in a "total unselected population" are available.

Discussion: Why Clinical Trials in the Evaluation of Life Style Evaluation? Genell L. Knatterud, PhD. Control Clinical Trials 1997: 18(6); 514-516. Abstract not available.

"The 60-Minutes-Myocardial Infarction Project": Comparison with a Registry and a Randomized Clinical Trial. A. Koch, A. Hörmann, H. Löwel, J. Senges. Accessed on 2003-06-30. "There is an ongoing debate about whether observational studies can produce reliable information on treatment comparisons. Randomized clinical trials are the accepted gold standard for this purpose. It is, however, impossible to investigate all important issues in randomized trials. Large observational studies are frequently performed and large clinical databases are available. Thus it is a relevant question, how reliable data from nonrandomized studies are. In this contribution, data of a large nonrandomized multicenter study on decision-making with respect to thrombolytic treatment in patients with acute myocardial infarction are compared with a randomized clinical trial and a population based registry. It is demonstrated that similar event rates are observed in those subgroups of the observational study that are comparable with the randomized study or the registry. Especially, there is no indication of underreporting of deaths in the observational study that would invalidate all investigations on treatment comparisons based on the observational study in the first step. Although such results can hardly be generalized to other situations, they might help balance the view on so-called horror-stories, where results of observational studies could not be verified in subsequent randomized clinical trials." Published in the Proceedings of the International Conference on Nonrandomized Comparative Clinical Studies in Heidelberg, April 10 -11,1997. www.symposion.com/nrccs/koch.htm

Breastfeeding and infant growth: biology or bias? M. S. Kramer, T. Guo, R. W. Platt, S. Shapiro, J. P. Collet, B. Chalmers, E. Hodnett, Z. Sevkovskaya, I. Dzikovich, I. Vanilovich. Pediatrics 2002: 110(2 Pt 1); 343-7. BACKGROUND: Available evidence suggests that prolonged and exclusive breastfeeding is associated with lower infant weight and length by 6 to 12 months of age. This evidence, however, is based on observational studies, which are unable to separate the effects of feeding mode per se from selection bias, reverse causality, and the confounding effects of maternal attitudinal factors. DESIGN/METHODS: A cluster-randomized trial in the Republic of Belarus of a breastfeeding promotion intervention modeled on the World Health Organization (WHO)/UNICEF Baby-Friendly Hospital Initiative versus control (then current) infant feeding practices. Healthy, full-term, singleton breastfed infants (n = 17 046) weighing > or =2500 g were enrolled soon after birth and followed up at 1, 2, 3, 6, 9, and 12 months old for measurements of weight, length, and head circumference. Data were analyzed according to intention-to-treat, while accounting for within-cluster correlation. To assess the potential for bias in observational studies of breastfeeding, we also analyzed our data as if we had conducted an observational study by ignoring treatment, combining the 2 randomized groups, and comparing 1378 infants weaned in the first month and those breastfed for the full 12 months of follow-up with either > or =3 months (n = 1271) or > or =6 months (n = 251) of exclusive breastfeeding. RESULTS: Infants from the experimental sites were significantly more likely to be breastfed (to any degree) at 3, 6, 9, and 12 months and were far more likely to be exclusively breastfed at 3 months (43.3% vs 6.4%). Mean birth weight was nearly identical in the 2 groups (3448 g, experimental; 3446 g, control). Mean weight was significantly higher in the experimental group by 1 month of age (4341 vs 4280 g). The difference increased through 3 months (6153 g vs 6047 g), declined slowly thereafter, and disappeared by 12 months (10564 g vs 10571 g). Analysis by z scores confirmed that infants in both groups gained more weight than the WHO/Centers for Disease Control and Prevention reference, with no evidence of undernutrition in the control group. Length followed a similar pattern. In the observational analyses, infants weaned in the first month were slightly lighter and shorter at birth and their weight-for-age and length-for-age z scores declined by 1 month, but they caught up to both experimental and the other observational groups by 6 months and were heavier and longer by 12 months. Among infants in the 2 prolonged and exclusive breastfeeding groups, weight-for-age z scores fell slightly between 3 and 12 months; length-for-age fell below the reference by 6 months with catch-up to the reference by 12 months. Head circumference showed no significant differences at any age between the 2 trial groups or among the observational groups. CONCLUSIONS: Our data, the first in humans based on a randomized experiment, suggest that prolonged and exclusive breastfeeding may actually accelerate weight and length gain in the first few months, with no detectable deficit by 12 months old. These results add support to current WHO and UNICEF feeding recommendations. Our observational analysis showing faster weight and length gains with early weaning and slower gains with prolonged and exclusive breastfeeding may reflect unmeasured confounding differences or a true biological effect of formula feeding.

Problems of Randomized Controlled Trails (RCT) in Surgery. R. Lefering, E. Neugebauer. Accessed on 2003-06-30. "Randomized controlled trials (RCT) are widely accepted as the gold standard for comparing different therapeutic modalities. The random allocation of patients avoids a selection bias and cares for an equal distribution of conscious as well as unsconscious prognostic factors among the sutdy groups, provided the number of patients included is large enough. The credibility of study results is further enhanced by applying techniques like independent investigators, blinding techniques, or homogenisation of patients and therapy." Published in the Proceedings of the International Conference on Nonrandomized Comparative Clinical Studies in Heidelberg, April 10 -11,1997. www.symposion.com/nrccs/lefering.htm

The Psychic Staring Effect: An Artifact of Pseudo Randomization. David F. Marks, John Colwell. Skeptical Inquirer 2000: 24(5); 41-44 and 49.

Evaluating complementary medicine: methodological challenges of randomised controlled trials. S. Mason, P. Tovey, A. F. Long. Bmj 2002: 325(7368); 832-4.

Reference - controlled observational studies - a new tool for post marketing studies and for evaluation of preventive measures. J. Michaelis. Accessed on 2003-06-30. "Several limitations of controlled clinical trials in phase-III drug research (e.g., highly selected patients, limited size of trials) make it mandatory to perform extensive research also in the post marketing phase. It is proposed to enhance the achievable evidence on therapeutic effects from large observational studies by designing small nested randomized trials. In contrast to the "comprehensive cohort studies" which have been discussed several years ago [25, 26], this combination of observational and experimental studies is planned in advance and does not result from patients’ compliance with the idea of randomization." Published in the Proceedings of the International Conference on Nonrandomized Comparative Clinical Studies in Heidelberg, April 10 -11,1997. www.symposion.com/nrccs/michaeli.htm

Effects of a Combination of Beta Carotene and Vitamin A on Lung Cancer and Cardiovascular Disease. GS Omenn, GE Goodman, MD Thornquist, J Balmes, MR Cullen, A Glass, JP Keogh, FL Meyskens, B Valanis, JH Williams, S Barnhart, S Hammar. The New England Journal of Medicine 1992: 334(18); 1150-1155. ABSTRACT: BACKGROUND. Lung cancer and cardiovascular disease are major causes of death in the United States. It has been proposed that carotenoids and retinoids are agents that may prevent these disorders. METHODS. We conducted a multicenter, randomized, double-blind, placebo-controlled primary prevention trial -- the Beta Carotene and Retinol Efficacy Trial -- involving a total of 18,314 smokers, former smokers, and workers exposed to asbestos. The effects of a combination of 30 mg of beta carotene per day and 25,000 IU of retinol (vitamin A) in the form of retinyl palmitate per day on the primary end point, the incidence of lung cancer, were compared with those of placebo. RESULTS. A total of 388 new cases of lung cancer were diagnosed during the 73,135 person-years of follow-up (mean length of follow-up, 4.0 years). The active-treatment group had a relative risk of lung cancer of 1.28 (95 percent confidence interval, 1.04 to 1.57; P=0.02), as compared with the placebo group. There were no statistically significant differences in the risks of other types of cancer. In the active-treatment group, the relative risk of death from any cause was 1.17 (95 percent confidence interval, 1.03 to 1.33); of death from lung cancer, 1.46 (95 percent confidence interval, 1.07 to 2.00); and of death from cardiovascular disease, 1.26 (95 percent confidence interval, 0.99 to 1.61). On the basis of these findings, the randomized trial was stopped 21 months earlier than planned; follow-up will continue for another 5 years. CONCLUSIONS. After an average of four years of supplementation, the combination of beta carotene and vitamin A had no benefit and may have had an adverse effect on the incidence of lung cancer and on the risk of death from lung cancer, cardiovascular disease, and any cause in smokers and workers exposed to asbestos.

Issues to Consider When Designing RCTs for CAM Therapies. House of Lords United Kingdom Parliament. Accessed on 2002-12-23. www.parliament.the-stationery-office.co.uk/pa/ld199900/ldselect/ldsctech/123/12315.htm#a68

Difficulties of Randomised Controlled Trials. House of Lords United Kingdom Parliament. Accessed on 2002-12-31. "Concerns over RCTs distorting a therapy or disguising its efficacy are not the unique concerns of CAM practitioners. Vincent & Furnham suggest that as attempts to apply the RCT to a wider and wider range of treatments have occurred, more and more problems have been uncovered. They list 10 such problems." www.parliament.the-stationery-office.co.uk/pa/ld199900/ldselect/ldsctech/123/12323.htm

Sequential treatment assignment with balancing for prognostic factors in the controlled clinical trial. S. J. Pocock, R. Simon. Biometrics 1975: 31(1); 103-15. In controlled clinical trials there are usually several prognostic factors known or thought to influence the patient's ability to respond to treatment. Therefore, the method of sequential treatment assignment needs to be designed so that treatment balance is simultaneously achieved across all such patients factor. Traditional methods of restricted randomization such as "permuted blocks within strata" prove inadequate once the number of strata, or combinations of factor levels, approaches the sample size. A new general procedure for treatment assignment is described which concentrates on minimizing imbalance in the distributions of treatment numbers within the levels of each individual prognostic factor. The improved treatment balance obtained by this approach is explored using simulation for a simple model of a clinical trial. Further discussion centers on the selection, predictability and practicability of such a procedure.

Randomised block design is more powerful than minimisation [letter]. N. Ross. British Medical Journal 1999: 318(7178); 263-4. Abstract not available.

Estimation from nonrandomized treatment comparisons using subclassification on propensity scores. D. B. Rubin. Accessed on 2003-06-30. "The aim of many analyses of medical data sets is to draw causal inferences about the relative effects of treatments, such as different methods of treating cancer patients. The data available to compare many such treatments are not based on the results of carefully conducted randomized clinical trials, but rather are collected while observing systems as they operate in "normal" practice, without any interventions implemented by randomized assignment rules. Such data are relatively inexpensive to obtain, however, and often do represent the spectrum of medical practice better than the settings of randomized experiments. Consequently, it is sensible to try to estimate the effects of treatments from such data sets, even if only to help design a new randomized experiment or shed light on the generalizability of results from existing randomized experiments. Standard methods of analysis using routine statistical software (e.g., linear or logistic regressions), however, can be quite deceptive for these objectives because they provide no warnings about their propriety. Propensity score methods are more reliable tools for addressing such objectives because the assumptions needed to make their answers appropriate are more assessable and transparent to the investigator. Subclassification on propensity scores is a particularly straightforward technique and is the topic of this article." Published in the Proceedings of the International Conference on Nonrandomized Comparative Clinical Studies in Heidelberg, April 10 -11,1997. www.symposion.com/nrccs/rubin.htm

Why randomized controlled trials fail but needn't: 1. Failure to gain "coal-face" committment and to use the uncertainty principle. DL Sackett, J Hoey. Canadian Medical Association Journal 2000: 162(9); 1311-1314. [Medline] [Full text] [PDF]

Patients' preferences and randomised trials. W. A. Silverman, D. G. Altman. Lancet 1996: 347(8995); p171-4.

Casting and Drawing Lots. W.A. Silverman, I. Chalmers. In: ed. Controlled Trials from History. By, I Chalmers, I. Milne, and U. Trohler. 2001; Vol.

Patient Heterogeneity in Clinical Trials. Richard Simon. Cancer Treatment Reports 1980: 64(2-3); 405-410. (Valuable comments on stratification and generalizability.) ABSTRACT: Interpretation of therapeutic results is complicated by variability in response among patients. This paper reviews fundamental statistical principles for the design of clinical trials. These methods seek to evaluate relative therapeutic efficacy in the presence of patient heterogeneity. Statistical science has more to offer therapeutics than significance tests among "comparable" treatment groups. The role of randomization and stratification is reviewed. The importance of study design, including patient eligibility and therapeutic standardization, to the generalization of conclusions is discussed.

Ginkgo for memory enhancement: a randomized controlled trial. P. R. Solomon, F. Adams, A. Silver, J. Zimmer, R. DeVeaux. Jama 2002: 288(7); 835-40. CONTEXT: Several over-the-counter treatments are marketed as having the ability to improve memory, attention, and related cognitive functions in as little as 4 weeks. These claims, however, are generally not supported by well-controlled clinical studies. OBJECTIVE: To evaluate whether ginkgo, an over-the-counter agent marketed as enhancing memory, improves memory in elderly adults as measured by objective neuropsychological tests and subjective ratings. DESIGN: Six-week randomized, double-blind, placebo-controlled, parallel-group trial. SETTING AND PARTICIPANTS: Community-dwelling volunteer men (n = 98) and women (n = 132) older than 60 years with Mini-Mental State Examination scores greater than 26 and in generally good health were recruited by a US academic center via newspaper advertisements and enrolled over a 26-month period from July 1996 to September 1998. INTERVENTION: Participants were randomly assigned to receive ginkgo, 40 mg 3 times per day (n = 115), or matching placebo (n = 115). MAIN OUTCOME MEASURES: Standardized neuropsychological tests of verbal and nonverbal learning and memory, attention and concentration, naming and expressive language, participant self-report on a memory questionnaire, and caregiver clinical global impression of change as completed by a companion. RESULTS: Two hundred three participants (88%) completed the protocol. Analysis of the modified intent-to-treat population (all 219 participants returning for evaluation) indicated that there were no significant differences between treatment groups on any outcome measure. Analysis of the fully evaluable population (the 203 who complied with treatment and returned for evaluation) also indicated no significant differences for any outcome measure. CONCLUSIONS: The results of this 6-week study indicate that ginkgo did not facilitate performance on standard neuropsychological tests of learning, memory, attention, and concentration or naming and verbal fluency in elderly adults without cognitive impairment. The ginkgo group also did not differ from the control group in terms of self-reported memory function or global rating by spouses, friends, and relatives. These data suggest that when taken following the manufacturer's instructions, ginkgo provides no measurable benefit in memory or related cognitive function to adults with healthy cognitive function.

Minimization: A new method of assigning patients to treatment and control groups. Donald R. Taves. Clinical Pharmacology and Therapeutics 1974: 15(5); 443-453. Abstract not available.

Use of unequal randomisation to aid the economic efficiency of clinical trials. David J Torgerson, Marion K Campbell. BMJ 2000: 321759. Abstract not available yet. [Full text] [PDF]

Minimisation: the platinum standard for trials? Randomisation doesn't guarantee similarity of groups; minimisation does [editorial] [see comments]. T Treasure, KD MacRae. BMJ 1998: 317(7155); 362-63. Abstract not available.

Minimisation is much better than the randomised block design in certain cases. Tom Treasure, KD MacRae. British Medical Journal 1999: 318(7195); 1420. Abstract not available.

Can postmarketing surveillance studies (Anwendungsbeobachtungen) give meaningful answers to important questions? A critical discussion of 5 examples.. M. Wadepuhl. Accessed on 2003-06-30. "Five conventional postmarketing surveillance studies with 86 to 5702 patients are discussed regarding their potential to answer essential questions and the quality of their implementation." Published in the Proceedings of the International Conference on Nonrandomized Comparative Clinical Studies in Heidelberg, April 10 -11,1997. www.symposion.com/nrccs/wadepuhl.htm

Investigating Therapies of Potentially Great Benefit: ECMO. J.H. Ware. Statistical Science 1989: 4(4); 298-317.

Mammography and the politics of randomised controlled trials. J. Wells. Bmj 1998: 317(7167); 1224-9. [Full text] [PDF]

Randomised controlled trial of laparoscopic versus open mesh repair for inguinal hernia: outcome and cost. J. Wellwood, M. J. Sculpher, D. Stoker, G. J. Nicholls, C. Geddes, A. Whitehead, R. Singh, D. Spiegelhalter. Bmj 1998: 317(7151); 103-10. OBJECTIVE: To compare tension-free open mesh hernioplasty under local anaesthetic with transabdominal preperitoneal laparoscopic hernia repair under general anaesthetic. DESIGN: A randomised controlled trial of 403 patients with inguinal hernias. SETTING: Two acute general hospitals in London between May 1995 and December 1996. SUBJECTS: 400 patients with a diagnosis of groin hernia, 200 in each group. Main outcome measures: Time until discharge, postoperative pain, and complications; patients' perceived health (SF-36), duration of convalescence, and patients' satisfaction with surgery; and health service costs. RESULTS: More patients in the open group (96%) than in the laparoscopic group (89%) were discharged on the same day as the operation (chi2 = 6.7; 1 df; P=0.01). Although pain scores were lower in the open group while the effect of the local anaesthetic persisted (proportional odds ratio at 2 hours 3.5 (2.3 to 5.1)), scores after open repair were significantly higher for each day of the first week (0.5 (0.3 to 0.7) on day 7) and during the second week (0.7 (0.5 to 0.9)). At 1 month there was a greater improvement (or less deterioration) in mean SF-36 scores over baseline in the laparoscopic group compared with the open group on seven of eight dimensions, reaching significance on five. For every activity considered the median time until return to normal was significantly shorter for the laparoscopic group. Patients randomised to laparoscopic repair were more satisfied with surgery at 1 month and 3 months after surgery. The mean cost per patient of laparoscopic repair was 335 pounds (95% confidence interval 228 pounds to 441 pounds) more than the cost of open repair. CONCLUSION: This study confirms that laparoscopic hernia repair has considerable short term clinical advantages after discharge compared with open mesh hernioplasty, although it was more expensive.

The protective effect of auto-immune buccal urine therapy (AIBUT) against the Raynaud phenomenon. C. W. Wilson. Med Hypotheses 1984: 13(1); 99-107. The efficacy of Auto-Immune Buccal Urine Therapy (AIBUT) against allergic symptoms depends upon sublingual administration of the correct dose of urine as determined by bio-assay in individual patients. Succeeding effective turn-off doses occur at the troughs of a sinusoidal dose-response curve. Efficacy of the administered dose is confirmed by reduction in the severity and duration of Cold-water-induced Raynaud symptoms after administration of effective doses of unboiled urine in AIBUT. Boiled urine does not affect the Raynaud phenomenon.

Randomised controlled trials in primary care: case study. Sue Wilson. British Medical Journal 2000: 32124-27. Abstract not available yet. [Medline] [Full text] [PDF]

A new design for randomized clinical trials. M. Zelen. N Engl J Med 1979: 300(22); p1242-5. This paper proposes a new method for planning randomized clinical trials. This method is especially suited to comparison of a best standard or control treatment with an experimental treatment. Patients are allocated into two groups by a random or chance mechanism. Patients in the first group receive standard treatment; those in the second group are asked if they will accept the experimental therapy; if they decline, they receive the best standard treatment. In the analyses of results, all those in the second group, regardless of treatment, are compared with those in the first group. Any loss of statistical efficiency can be overcome by increased numbers. This experimental plan is indeed a randomized clinical trial and has the advantage that, before providing consent, a patient will know whether an experimental treatment is to be used.

The randomization and stratification of patients to clinical trials. M. Zelen. Journal of Chronic Diseases 1974: 27(7-8); 365-75. Abstract not available yet.

evidence >> apples >> singlecase (1)

Single-case Reseach Designs for the Science and Practice of Neurotherapy. Neville Blampied, Arreed Barabasz, Marianne Barabasz. Journal of Neurotherapy 1996: 1(4); The dominant research tradition in psychology and psychiatry requires that numbers of subjects be randomly allocated to form treatment groups. Treatment effects typically are assessed by testing hypotheses about group mean differences. This paradigm seriously inhibits the implementation of the scientist-practitioner model embraced by practitioners of neurotherapy, stifles innovation and precludes the scientific investigation of the exceptional or novel case. Single-case research designs make it possible to draw scientifically valid conclusions from the investigation and treatment of individual cases. The key elements of these designs are outlined and particular designs of potential utility to neurotherapy are discussed.

evidence >> apples >> vitaminC (10)

Clinical Evaluaation of Vitamin C and other Micronutrients in the Treatment of Cancer. Gerald MD Batist. Journal of Orthomolecular Medicine 2000: 15(4); 189-192.

The orthomolecular treatment of cancer. II. Clinical trial of high-dose ascorbic acid supplements in advanced human cancer. E. Cameron, A. Campbell. Chem Biol Interact 1974: 9(4); 285-315. [Medline]

Failure of high-dose vitamin C (ascorbic acid) therapy to benefit patients with advanced cancer. A controlled trial. E. T. Creagan, C. G. Moertel, J. R. O'Fallon, A. J. Schutt, M. J. O'Connell, J. Rubin, S. Frytak. New England Journal of Medical 1979: 301(13); 687-90. One hundred and fifty patients with advanced cancer participated in a controlled double-blind study to evaluate the effects of high-dose vitamin C on symptoms and survival. Patients were divided randomly into a group that received vitamin C (10 g per day) and one that received a comparably flavored lactose placebo. Sixty evaluable patients received vitamin C and 63 received a placebo. Both groups were similar in age, sex, site of primary tumor, performance score, tumor grade and previous chemotherapy. The two groups showed no appreciable difference in changes in symptoms, performance status, appetite or weight. The median survival for all patients was about seven weeks, and the survival curves essentially overlapped. In this selected group of patients, we were unable to show a therapeutic benefit of high-dose vitamin C treatment.

Antiocidant Nutrients and Cancer. Abram Hoffer, MD, PhD, FRCP(C). Journal of Orthomolecular Medicine 2000: 15(4); 193-200.

Vitamin C as Cancer Therapy: An Overview. L.J. Hoffer, MD, PhD, C MD Tamayo, M.A. Dr. PH Richardson. Journal of Orthomolecular Medicine 2000: 15(4); 175-180.

The antioxidant vitamins and cardiovascular disease. A critical review of epidemiologic and clinical trial data. P. Jha, M. Flather, E. Lonn, M. Farkouh, S. Yusuf. Ann Intern Med 1995: 123(11); 860-72. PURPOSE: To review prospective epidemiologic studies and randomized trials regarding the role of antioxidant vitamins (vitamins E and C and beta-carotene) in the prevention of cardiovascular disease, with emphasis on differences in results obtained by these two types of studies. DATA SOURCES: Computerized and manual searches of the literature on antioxidant vitamins and cardiovascular disease. STUDY SELECTION: Prospective epidemiologic studies and randomized trials that included 100 or more participants and provided quantified estimates of antioxidant vitamin intake. DATA SYNTHESIS: Comparisons of relative risk reductions (RRR) across observational studies and randomized trials, including assessment of dose-response relations. RESULTS: All three large epidemiologic cohort studies of vitamin E noted that high-level vitamin E intake or supplementation was associated with a significant reduction in cardiovascular disease (RRR range, 31% to 65%), as measured by various fatal and nonfatal cardiovascular end points. To obtain these reductions, vitamin E supplementation must last at least 2 years. Less consistent reductions were seen in studies of beta-carotene (RRR range, -2% to 46%) and vitamin C (RRR range, -25% to 51%). Considerable biases in observational studies, such as different health behaviors of persons using antioxidants, may account for the observed benefit. By contrast, none of the completed randomized trials showed any clear reduction in cardiovascular disease with vitamin E, vitamin C, or beta-carotene supplementation. The trials were not specifically designed to assess cardiovascular disease, did not provide data on nonfatal cardiovascular end points, may have had insufficient treatment durations, and used suboptimal vitamin E doses. The completed trials were of adequate size to indicate that the true therapeutic benefit of vitamin E and other antioxidants in reducing fatal cardiovascular disease (a survival benefit as long as 5 years) is probably more modest than the epidemiologic data suggest. CONCLUSION: The epidemiologic data suggest that antioxidant vitamins reduce cardiovascular disease, with the clearest effect for vitamin E; however, completed randomized trials do not support this finding. Much of this controversy should be resolved by the ongoing large-scale and long-term randomized trials designed specifically to evaluate effects on cardiovascular disease.

High-dose vitamin C versus placebo in the treatment of patients with advanced cancer who have had no prior chemotherapy. A randomized double-blind comparison. C Moertel. New England Journal of Medicine 1985: 312(3); 137-141. ABSTRACT: It has been claimed that high-dose vitamin C is beneficial in the treatment of patients with advanced cancer, especially patients who have had no prior chemotherapy. In a double-blind study 100 patients with advanced colorectal cancer were randomly assigned to treatment with either high-dose vitamin C (10 g daily) or placebo. Overall, these patients were in very good general condition, with minimal symptoms. None had received any previous treatment with cytotoxic drugs. Vitamin C therapy showed no advantage over placebo therapy with regard to either the interval between the beginning of treatment and disease progression or patient survival. Among patients with measurable disease, none had objective improvement. On the basis of this and our previous randomized study, it can be concluded that high-dose vitamin C therapy is not effective against advanced malignant disease regardless of whether the patient has had any prior chemotherapy.

New insights into the physiology and pharmacology of vitamin C. S. J. Padayatty, M. Levine. Cmaj 2001: 164(3); 353-5.

Clinical and Experimental Experiences with Intravenous Vitamin C. Neil H. PA-C Riordan, Hugh D. MD Riordan, Joseph PhD Casciari. Journal of Orthomolecular Medicine 2000: 15(4); 201-13.

Dietary factors and risk of breast cancer: combined analysis of 12 case-control studies. G. R. Howe, T. Hirohata, T. G. Hislop, J. M. Iscovich, J. M. Yuan, K. Katsouyanni, F. Lubin, E. Marubini, B. Modan, T. Rohan, et al. J Natl Cancer Inst 1990: 82(7); 561-9. We conducted a combined analysis of the original data to evaluate the consistency of 12 case-control studies of diet and breast cancer. Our analysis shows a consistent, statistically significant, positive association between breast cancer risk and saturated fat intake in postmenopausal women (relative risk for highest vs. lowest quintile, 1.46; P less than .0001). A consistent protective effect for a number of markers of fruit and vegetable intake was demonstrated; vitamin C intake had the most consistent and statistically significant inverse association with breast cancer risk (relative risk for highest vs. lowest quintile, 0.69; P less than .0001). If these dietary associations represent causality, the attributable risk (i.e., the percentage of breast cancers that might be prevented by dietary modification) in the North American population is estimated to be 24% for postmenopausal women and 16% for premenopausal women.

evidence >> blinding (1)

The difficulties of double blinding. J. Mercer. Science 2002: 297(5590); 2208. Abstract not available yet. [Medline]

evidence >> leftout >> attrition (9)

Article makes simple errors and could cause unnecessary deaths. C. Baigent, R. Collins, R. Peto. British Medical Journal 2002: 324(7330); 167. (An interesting critical review of a large randomized study and a meta-analysis.) "The worldwide meta-analysis of antiplatelet trials shows that low dose aspirin (or some other effective antiplatelet regimen) reduces non-fatal myocardial infarction, non-fatal stroke, and vascular death in a wide range of patients who are at high risk of occlusive vascular disease. A paper disputing this was published concurrently in the For Debate section of the journal, but the arguments in it (some of which the author also published on the same date in an editorial in the Lancet) depend strongly on quite simple mistakes about the randomised evidence and could cause unnecessary deaths." [Medline] [Full text] [PDF]

Statistical issues in randomized trials of cancer screening. S. G. Baker, B. S. Kramer, P. C. Prorok. BMC Med Res Methodol 2002: 2(1); 11. BACKGROUND: The evaluation of randomized trials for cancer screening involves special statistical considerations not found in therapeutic trials. Although some of these issues have been discussed previously, we present important recent and new methodologies. METHODS: Our emphasis is on simple approaches. RESULTS: We make the following recommendations:(1) Use death from cancer as the primary endpoint, but review death records carefully and report all causes of death(2) Use a simple "causal" estimate to adjust for nonattendance and contamination occurring immediately after randomization(3) Use a simple adaptive estimate to adjust for dilution in follow-up after the last screen CONCLUSION: The proposed guidelines combine recent methodological work on screening endpoints and noncompliance/contamination with a new adaptive method to adjust for dilution in a study where follow-up continues after the last screen. These guidelines ensure good practice in the design and analysis of randomized trials of cancer screening. [Abstract] [Full text] [PDF]

Quantification of the completeness of follow-up. T. G. Clark, D. G. Altman, B. L. De Stavola. Lancet 2002: 359(9314); 1309-10. Completeness of follow-up is important, especially in clinical trials, since unequal follow-up in the treatment groups can bias the analysis of results. In survival studies, information on participants who do not complete the study is often omitted because their data can be included up to the time at which they were lost to follow-up. We propose a simple measure of completeness that is the ratio of the total observed person-time and the potential person-time of follow-up in a study. Our measure is easy to calculate, can be illustrated pictorially, and can be used to identify subgroups with especially poor follow-up. [Medline]

Hold the Lard! The Atkins Diet still doesn't work.. Michael Fumento. Accessed on 2002-12-06. A careful analysis of the recent research on the Atkins diet shows that there was a much higher drop out rate in that group, which could partially explain the promising results of this diet. www.reason.com/hod/mf120502.shtml

Attrition in prevention research. W. B. Hansen, L. M. Collins, C. K. Malotte, C. A. Johnson, J. E. Fielding. J Behav Med 1985: 8(3); 261-75. Selective attrition can detract from the internal and external validity of longitudinal research. Four tests of selective attrition applicable to longitudinal prevention research were conducted on data bases from two recent studies. These tests assessed (1) differences between dropouts and stayers in terms of pretest indices of primary outcome variables (substance use), (2) differences in change scores for dropouts and stayers, (3) differences in rates of attrition among experimental conditions, and (4) differences in pretest indices for dropouts among conditions. Results of these analyses indicate that cigarette smokers, alcohol drinkers, and marijuana users are more likely to drop out than nonusers, limiting the external validity of both studies. For one project, differential rates of attrition among conditions suggested a possible attrition artifact which will interfere with interpretation of outcome results, possibly masking true program effectiveness. Recommendations for standardizing reports of attrition and for avoiding attrition through second efforts are made.

Tracking and follow-up of 16,915 adolescents: minimizing attrition bias. T. C. Morrison, D. R. Wahlgren, M. F. Hovell, J. Zakarian, S. Burkham-Kreitner, C. R. Hofstetter, D. J. Slymen, K. Keating, S. Russos, J. A. Jones. Control Clin Trials 1997: 18(5); 383-96. This paper reports a multi-dimensional approach to minimize drop-outs from a two-year follow-up of a clinical trial designed to reduce initiation of tobacco use in 16,915 adolescent orthodontic patients. A hierarchical approach to data collection and tracking was employed. Seventy percent of participants were reached and interviewed at home by telephone. Strategies used to survey remaining participants included calling parents' work numbers and directory assistance, reviewing orthodontists' charts, sending surveys by mail, offering incentives, and using reverse telephone directories. More than 92% of the participants completed follow-up surveys. Multivariate analyses showed that baseline tobacco and alcohol use predicted loss to follow-up. Similarly, the number of procedures used to track each participant predicted presence of risk behaviors at post-test, demonstrating that an organized tracking hierarchy curtailed even greater compromises to internal and external validity. Evaluation and costs of individual strategies are discussed.

Tracking and attrition in longitudinal school-based smoking prevention research. P. L. Pirie, S. J. Thomson, S. L. Mann, A. V. Peterson, Jr., D. M. Murray, B. R. Flay, J. A. Best. Prev Med 1989: 18(2); 249-56. Research in the development of school-based smoking prevention programs has resulted in a set of approaches of known short-term efficacy. Further evaluation of these approaches now requires long-term follow-up of participants. To minimize the problems caused by attrition in these longitudinal studies, investigators have developed techniques for tracking study participants. Based primarily on the use of the telephone, mail, and public documents, these methods require good background information on both the study participants and their parents. This article summarizes the experience of three teams of researchers engaged in such follow-up studies. These investigators have identified the types of background information most useful in long-term follow-up of participants, have developed a set of strategies to obtain such background information, and have developed methods for successfully tracking participants after a lapse of several years.

Sample size slippages in randomised trials: exclusions and the lost and wayward. K. F. Schulz, D. A. Grimes. Lancet 2002: 359(9308); 781-5. Proper randomisation means little if investigators cannot include all randomised participants in the primary analysis. Participants might ignore follow-up, leave town, or take aspartame when instructed to take aspirin. Exclusions before randomisation do not bias the treatment comparison, but they can hurt generalisability. Eligibility criteria for a trial should be clear, specific, and applied before randomisation. Readers should assess whether any of the criteria make the trial sample atypical or unrepresentative of the people in which they are interested. In principle, assessment of exclusions after randomisation is simple: none are allowed. For the primary analysis, all participants enrolled should be included and analysed as part of the original group assigned (an intent-to-treat analysis). In reality, however, losses frequently occur. Investigators should, therefore, commit adequate resources to develop and implement procedures to maximise retention of participants. Moreover, researchers should provide clear, explicit information on the progress of all randomised participants through the trial by use of, for instance, a trial profile. Investigators can also do secondary analyses on, for instance, per-protocol or as-treated participants. Such analyses should be described as secondary and non-randomised comparisons. Mishandling of exclusions causes serious methodological difficulties. Unfortunately, some explanations for mishandling exclusions intuitively appeal to readers, disguising the seriousness of the issues. Creative mismanagement of exclusions can undermine trial validity. [Medline] [Abstract]

Intention to Treat Analysis in Clinical Trials When There are Missing Data. Streiner, D, J Geddes. Evid Based Ment Health 2001: 4(3); 70-71.

evidence >> leftout >> compliance (2)

Randomised study of long term outcome after epidural versus non-epidural analgesia during labour. C. J. Howell, T. Dean, L. Lucking, K. Dziedzic, P. W. Jones, R. B. Johanson. Bmj 2002: 325(7360); 357. (This paper uses ITT analysis, but it may not be appropriate. See the rapid responses to this paper for details.) OBJECTIVE: To determine whether epidural analgesia during labour is associated with long term backache. DESIGN: Follow up after randomised controlled trial. Analysis by intention to treat. SETTING: Department of obstetrics and gynaecology at one NHS trust. PARTICIPANTS: 369 women: 184 randomised to epidural group (treatment as allocated received by 123) and 185 randomised to non-epidural group (treatment as allocated received by 133). In the follow up study 151 women were from the epidural group and 155 from the non-epidural group. MAIN OUTCOME MEASURES: Self reported low back pain, disability, and limitation of movement assessed through one to one interviews with physiotherapist, questionnaire on back pain and disability, physical measurements of spinal mobility. RESULTS: There were no significant differences between groups in demographic details or other key characteristics. The mean time interval from delivery to interview was 26 months. There were no significant differences in the onset or duration of low back pain, with nearly a third of women in each group reporting pain in the week before interview. There were no differences in self reported measures of disability in activities of daily living and no significant differences in measurements of spinal mobility. CONCLUSIONS: After childbirth there are no differences in the incidence of long term low back pain, disability, or movement restriction between women who receive epidural pain relief and women who receive other forms of pain relief. [Medline] [Abstract] [Full text] [PDF]

Intention-to-treat principle. V. M. Montori, G. H. Guyatt. Cmaj 2001: 165(10); p1339-41. [Medline] [Full text] [PDF]

evidence >> leftout >> exclsuions (1)

Statistical Assumptions as Empirical Commitments. Richard A. Berk, David A. Freedman. Accessed on 2001-August. "Researchers who study punishment and social control, like those who study other social phenomena, typically seek to generalize their findings from the data they have to some larger context: in statistical jargon, they generalize from a sample to a population. Generalizations are one important product of empirical inquiry. Of course, the process by which the data are selected introduces uncertainty. Indeed, any given dataset is but one of many that could have been studied. If the dataset had been different, the statistical summaries would have been different, and so would the conclusions, at least by a little." stat-www.berkeley.edu/~census/berk2.pdf

evidence >> leftout >> exclusions (13)

A controlled trial of immunotherapy for asthma in allergic children. N. F. Adkinson, Jr., P. A. Eggleston, D. Eney, E. O. Goldstein, K. C. Schuberth, J. R. Bacon, R. G. Hamilton, M. E. Weiss, H. Arshad, C. L. Meinert, J. Tonascia, B. Wheeler. New England Journal of Medicine 1997: 336(5); 324-31. (Noncompliant patients were excluded prior to the start of the trial) BACKGROUND: Injections of allergens are widely prescribed for patients with asthma, but little is known about the effectiveness of immunotherapy. METHODS: We conducted a double-blind, placebo-controlled trial of multiple-allergen immunotherapy in 121 allergic children with moderate-to-severe, perennial asthma. The children, who required daily medication for their asthma, were randomly assigned to receive subcutaneous injections of either a mixture of up to seven aeroallergen extracts or a placebo. Maintenance injections were continued for 18 months or longer. Medications were adjusted every two to three weeks on the basis of peak flow rates and symptoms. The principal outcome was the daily medication score. Bronchial sensitivity to methacholine (the concentration provoking a 20 percent decrease in the forced expiratory volume in one second [PC20]) was measured twice yearly. RESULTS: The median medication score declined from 5.4 to 4.9 in the immunotherapy group (P<0.001) and from 5.2 to 5.0 in the placebo group (P<0.001), but there was no significant difference between the groups (P>0.6). The number of days on which oral corticosteroids were used was similar in the two groups. Partial or complete remission of asthma occurred in 31 percent of the immunotherapy group and in 28 percent of the placebo group (P>0.5). There was no difference between the groups in the use of medical care, symptoms, or peak flow rates. The median PC20 increased significantly in both groups, but again with no difference between the two groups. CONCLUSIONS: Immunotherapy with injections of allergens for over two years was of no discernible benefit in allergic children with perennial asthma who were receiving appropriate medical treatment. [Abstract] [Full text] [PDF]

Unjustified exclusion of elderly people from studies submitted to research ethics committee for approval: descriptive study. A. Bayer, W. Tadd. British Medical Journal 2000: 321(7267); 992-3. [Full text] [PDF]

Exclusion of elderly people from clinical research: a descriptive study of published reports. G. Bugeja, A. Kumar, A. K. Banerjee. British Medical Journal 1997: 315(7115); 1059. [Full text]

Post-randomisation exclusions: the intention to treat principle and excluding patients from analysis. D. Fergusson, S. D. Aaron, G. Guyatt, P. Hebert. Bmj 2002: 325(7365); 652-4. Abstract not available yet. [Full text] [PDF]

Participation in Research and Access to Experimental Treatments by HIV-Infected Patients. Allen L. Gifford, William E. Cunningham, Kevin C. Heslin, Ron M. Andersen, Terry Nakazono, Dale K. Lieu, Martin F. Shapiro, Samuel A. Bozzette, the HIV Cost and Services Utilization Study Consortium. N Engl J Med 2002: 346(18); 1373-1382. Background Although there is concern that minority groups and women are underrepresented in research involving patients with human immunodeficiency virus (HIV) infection, the available data are inconclusive. Methods We used nationally representative data from the HIV Cost and Services Utilization Study to determine the characteristics of the participants and nonparticipants in trials of medications for HIV infection and whether or not patients had access to experimental treatments. A probability sample of 2864 persons, representing all 231,400 adults with known HIV infection who are cared for in the contiguous United States, were interviewed on three occasions between 1996 and 1998. They were asked about participation in clinical research studies of medications and past receipt of experimental medications for HIV. Results We estimate that 14 percent of adults receiving care for HIV infection participated in a medication trial or study; 24 percent had received experimental medications; and 8 percent had tried and failed to obtain experimental treatments. According to multivariate models, non-Hispanic blacks and Hispanics were less likely to be participating in trials than non-Hispanic whites (odds ratio for participation among non-Hispanic blacks, 0.50 [95 percent confidence interval, 0.28 to 0.91]; odds ratio among Hispanics, 0.58 [95 percent confidence interval, 0.37 to 0.93]) and to have received experimental medications (odds ratios, 0.41 [95 percent confidence interval, 0.32 to 0.54] and 0.56 [95 percent confidence interval, 0.41 to 0.78], respectively). Patients who were cared for in private health maintenance organizations were less likely to participate in trials than those with fee-for-service insurance (odds ratio, 0.43 [95 percent confidence interval, 0.21 to 0.88]). Women were not underrepresented in research trials and had a similar likelihood of receiving experimental treatments. Conclusions Among patients with HIV infection, participation in research trials and access to experimental treatment is influenced by race or ethnic group and type of health insurance. [Abstract] [Full text] [PDF]

Research Fables from the Sisters Grinn, No. 11. Chicken Little.. Jeanne Grace, University of Rochester School of Nursing. Accessed on 2003-05-27. "Chicken Little was an eager young hatchling on a farm near Scholarship Forest, the home of Little Red Research Student. Little Red was attending graduate school out of town, but by chance was home visiting her trans-species Little cousins when Chicken Little was hatched. Many families of Scholarship Forest held Little Red up as a role model to their children, but Chicken Little actually imprinted on her. As a result, although poultry rarely aspire to scholarly careers, Chicken Little wanted to become an evidence-based health care provider when she grew up. She tried to practice research utilization faithfully every day." http://www.urmc.rochester.edu/SON/Fables/clittle.html

The exclusion of the elderly and women from clinical trials in acute myocardial infarction. J. H. Gurwitz, N. F. Col, J. Avorn. Jama 1992: 268(11); 1417-22. OBJECTIVE--To determine the extent to which the elderly have been excluded from trials of drug therapies used in the treatment of acute myocardial infarction, to identify factors associated with such exclusions, and to explore the relationship between the exclusion of elderly and the representation of women. DATA SOURCES--We conducted a systematic search of the English-language literature from January 1960 through September 1991 to identify all relevant studies of specific pharmacotherapies employed in the treatment of acute myocardial infarction. To accomplish this, we searched MEDLINE, major cardiology textbooks, meta-analyses, reviews, editorials, and the bibliographies of all identified articles. STUDY SELECTION--Only trials in which patients were randomly allocated to receive a specific therapeutic regimen or a placebo or nonplacebo control regimen were included for review. DATA EXTRACTION--Studies were abstracted for year of publication, source of support, performance location, drug therapies to which patients were randomized, use of invasive diagnostic tests or therapeutic procedures, exclusion criteria, size and demographic characteristics of the randomized study population, and principal outcome measures. DATA SYNTHESIS--A total of 214 trials met inclusion criteria, involving 150,920 study subjects. Over 60% of trials excluded persons over the age of 75 years. Studies published after 1980 were more likely to have age-based exclusions compared with studies published before 1980 (adjusted odds ratio, 4.92; 95% confidence interval, 2.33 to 10.54). Trials of thrombolytic therapy involving an invasive procedure were more likely to exclude elderly patients compared with other studies (adjusted odds ratio, 2.45; 95% confidence interval, 1.10 to 5.47). Studies with age-based exclusions had a smaller percentage of women compared with those without such exclusions (18% vs 23%; P = .0002), with the mean age of the study population significantly associated with the proportion of women participants (P = .0001, R2 = .29). CONCLUSIONS--Age-based exclusions are frequently used in clinical trials of medications used in the treatment of acute myocardial infarction. Such exclusions limit the ability to generalize study findings to the patient population that experiences the most morbidity and mortality from acute myocardial infarction.

Spectrum bias in the evaluation of diagnostic tests: lessons from the rapid dipstick test for urinary tract infection. M. S. Lachs, I. Nachamkin, P. H. Edelstein, J. Goldman, A. R. Feinstein, J. S. Schwartz. Ann Intern Med 1992: 117(2); 135-40. (A diagnostic test will usually perform more effectively in patients who have clear signs of illness and will preform less effectively in borderline patients. This is reflected in values of sensitivity and specificity that change depending on who is evaluated. If the patients being evaluated are similar to the patients you see in your practice, this is not a problem. But often, patients with more extreme symptoms are preferentially recruited into research studies. This leads to spectrum bias which can often overstate the effectiveness of a diagnostic test.) OBJECTIVE: To determine if the leukocyte esterase and bacterial nitrite rapid dipstick test for urinary tract infection (UTI) is susceptible to spectrum bias (when a diagnostic test has different sensitivities or specificities in patients with different clinical manifestations of the disease for which the test is intended). DESIGN: Cross-sectional study. PATIENTS: A total of 366 consecutive adult patients in whom clinicians performed urinalysis to diagnose or exclude UTI. SETTING: An urban emergency department and walk-in clinic. MEASUREMENTS: After the patient encounter, but before dipstick test or culture was done, clinicians recorded the signs and symptoms that were the basis for suspecting UTI and for performing a urinalysis and an estimate of the probability of UTI based on the clinical evaluation. For all patients who received urinalysis, dipstick tests and culture were done in the clinical microbiology laboratory by medical technologists blinded to clinical evaluation. Sensitivity for the dipstick was calculated using a positive result in either leukocyte esterase or bacterial nitrite, or both, as the criterion for a positive dipstick, and greater than 10(5) CFU/mL for a positive culture. RESULTS: In the 107 patients with a high (greater than 50%) prior probability of UTI, who had many characteristic UTI symptoms, the sensitivity of the test was excellent (0.92; 95% CI, 0.82 to 0.98). In the 259 patients with a low (less than or equal to 50%) prior probability of UTI, the sensitivity of the test was poor (0.56; CI, 0.03 to 0.79). CONCLUSIONS: The leukocyte esterase and bacterial nitrite dipstick test for UTI is susceptible to spectrum bias, which may be responsible for differences in the test's sensitivity reported in previous studies. As a more general principle, diagnostic tests may have different sensitivities or specificities in different parts of the clinical spectrum of the disease they purport to identify or exclude, but studies evaluating such tests rarely report sensitivity and specificity in subgroups defined by clinical symptoms. When diagnostic tests are evaluated, information about symptoms in the patients recruited for study should be included, and analyses should be done within appropriate clinical subgroups so that clinicians may decide if reported sensitivities and specificities are applicable to their patients. [Medline]

Comorbidity of chronic diseases in general practice. F. G. Schellevis, J. van der Velden, E. van de Lisdonk, J. T. van Eijk, C. van Weel. J Clin Epidemiol 1993: 46(5); 469-73. With the increasing number of elderly people in The Netherlands the prevalence of chronic diseases will rise in the next decades. It is recognized in general practice that many older patients suffer from more than one chronic disease (comorbidity). The aim of this study is to describe the extent of comorbidity for the following diseases: hypertension, chronic ischemic heart disease, diabetes mellitus, chronic nonspecific lung disease, osteoarthritis. In a general practice population of 23,534 persons, 1989 patients have been identified with one or more chronic diseases. Only diseases in agreement with diagnostic criteria were included. In persons of 65 and older 23% suffer from one or more of the chronic diseases under study. Within this group 15% suffer from more than one of the chronic diseases. Osteoarthritis and diabetes mellitus are the diseases with the highest rate of comorbidity. Comorbidity restricts the external validity of results from single-disease intervention studies and complicates the organization of care.

Nicotine patch therapy in adolescent smokers. T. A. Smith, R. F. House, Jr., I. T. Croghan, T. R. Gauvin, R. C. Colligan, K. P. Offord, L. C. Gomez-Dahl, R. D. Hurt. Pediatrics 1996: 98(4 Pt 1); 659-67. OBJECTIVE: To evaluate the safety, tolerance, and efficacy of 24-hour nicotine patch therapy in adolescent smokers who were trying to stop smoking. DESIGN: Nonrandomized, open-label, 6-month clinical trial. SETTING: Five public high schools in the Rochester, MN, area. SUBJECTS: Twenty-two adolescent smokers, aged 13 through 17 years, with current smoking rate of 20 or more cigarettes per day (cpd). INTERVENTION: Daily nicotine patch therapy for 8 weeks (22 mg/d for 6 weeks followed by 11 mg/d for 2 weeks). Weekly individual behavioral counseling and group support continued for 8 weeks with follow up visits at 3 and 6 months and a mailed survey at 1 year. MAIN OUTCOME MEASURES: Self-reported smoking abstinence verified by expired air carbon monoxide of 8 ppm or less, nicotine withdrawal symptoms, adverse experiences, and blood cotinine levels. RESULTS: Subjects had a mean +/- SD smoking rate of 23.3 +/- 5.0 (range, 20 to 35) cpd at study entry and 2.6 +/- 1.6 years of smoking; the mean age was 15.9 +/- 1.2 (range 13 through 17) years, and 68% were girls. Of the 22 participants, 19 (86%) completed patch therapy, 3 (14%) had biochemically validated smoking cessation at week 8, and 1 continued to be smoke free at 3 and 6 months after patch initiation. There was a significant decrease from baseline in the mean nicotine withdrawal scores for days 4 and 7 of week 1 and the mean for weeks 2 through 8. Skin reactions were the most common adverse event. As the worst skin reactions, 55% had erythema only, 5% had erythema and edema, and 9% had erythema and vesicles, whereas 32% had no skin reactions. Other reported adverse events were headaches (41%), nausea and vomiting (41%), tiredness (41%), dizziness (27%), and arm pain (23%). None of these were considered serious, life threatening, or led to the discontinuation of patch therapy. In adults with comparable smoking rates, we found that the adolescents had lower blood cotinine levels. Those smoking 20 to 25 cpd had cotinine levels of 146 +/- 84 (adolescents) vs 260 +/- 98 (adults) ng/ml, and those smoking 26 to 35 cpd had levels of 169 +/- 73 vs 276 +/- 110 ng/ml, respectively. CONCLUSION: Nicotine patch therapy seems safe in adolescent smokers. Placebo-controlled trials are needed to establish the efficacy of nicotine patch therapy in adolescents.

The Effect of School Dropout Rates on Estimates of Adolescent Substance Use among Three Racial/Ethnic Groups. Randall C. Swaim, F Beauvais, EL Chavez, ER Oetting. American Journal of Public Health 1997: 87(1); 51-55. (A study of adolescent drug use based in a high school would leave out anyone not attending school. This could lead to a serious bias overall. Because there is an interaction with race, you might also have problems with any results that imply that one racial or ethnic group has greater or less drug use than another.) ABSTRACT: OBJECTIVES: This study examined, across three racial/ethnic groups, how the inclusion of data on drug use of dropouts can alter estimates of adolescent drug use rates. METHODS: Self-report rates of lifetime prevalence and use in the previous 30 days were obtained from Mexican American, White non-Hispanic, and Native American student (n = 738) and dropouts (n = 774). Rates for the age cohort (students and dropouts) were estimated with a weighted correction formula. RESULTS: Rates of use reported by dropouts were 1.2 to 6.4 times higher than those reported by students. Corrected rates resulted in changes in relative rates of use by different ethnic groups. CONCLUSIONS: When only in-school data are available, errors in estimating drug use among groups with high rates of school dropout can be substantial. Correction of student-based data to include drug use of dropouts leads to important changes in estimated levels of drug use and alters estimates of the relative rates of use for racial/ethnic minority groups with high dropout rates. [Medline]

Physicians' reasons for not entering eligible patients in a randomized clinical trial of surgery for breast cancer. K. M. Taylor, R. G. Margolese, C. L. Soskolne. N Engl J Med 1984: 310(21); p1363-7. We studied the reasons surgical principal investigators chose not to enter patients in a large, multicenter trial sponsored by a cooperative group. In 1976 the National Surgical Adjuvant Project for Breast and Bowel Cancers (NSABP) initiated a clinical trial to compare segmental mastectomy and postoperative radiation, or segmental mastectomy alone, with total mastectomy. Because the low rates of accrual were threatening to close the trial prematurely, we mailed a questionnaire to the 94 NSABP principal investigators, asking why they were not entering eligible patients in the trial. A response rate of 97 per cent was achieved. Physicians who did not enter all eligible patients offered the following explanations: (1) concern that the doctor-patient relationship would be affected by a randomized clinical trial (73 per cent), (2) difficulty with informed consent (38 per cent), (3) dislike of open discussions involving uncertainty (22 per cent), (4) perceived conflict between the roles of scientist and clinician (18 per cent), (5) practical difficulties in following procedures (9 per cent), and (6) feelings of personal responsibility if the treatments were found to be unequal (8 per cent). Further investigation into the behavioral aspects of the investigator-patient relationship is particularly pressing, since fear of change in this relationship was the most common reason given for not entering eligible patients in the trial.

Representation of older patients in cancer treatment trials. EL Trimble, CL Carter, D Cain, B Freidlin, RS Ungerleider, MA Friedman. Cancer 1994: 74(7); 2208-14. ABSTRACT: In 1990, the five leading causes of cancer death in men aged 65 and older were carcinomas of the lung, prostate, colon and rectum, and pancreas, and leukemia. For women in this age group, the five leading causes of cancer death were carcinomas of the lung, breast, colon and rectum, pancreas, and ovary. To determine the representation of the elderly in clinical trials, the 1992 accrual of the National Cancer Institute (NCI)-sponsored Clinical Cooperative Group treatment trials (which included more than 8000 elderly patients) for the aforementioned sites was compared with the 1990 incidence data from the NCI's Surveillance, Epidemiology, and End Results program. Of the male patients enrolled in the trials, an average of 39% were older than 65 (47.3% lung, 79.5% prostate, 47.5% colorectal, 45.6% pancreas, and 9.6% leukemia); whereas 25.9% of all women enrolled in trials were 65 or older (43.6% lung, 17.3% breast, 46.2% colorectal, 59.6% pancreas, and 35.4% ovary). With respect to incidence, older patients generally are underrepresented in cancer treatment trials. With the exception of the data on prostate cancer, each of the comparisons using the Z statistic gave probability values of less than 0.01. The most significant discrepancies between incidence and participation in cancer treatment protocols were noted for leukemia in males and breast cancer in females. Possible explanations for these findings include (1) a research focus on aggressive therapy, which may be unacceptably toxic to the elderly; (2) presence of comorbidity in the elderly; (3) fewer trials available specifically aimed at older patients; (4) limited expectations for long term benefits on the part of physicians, relatives, and the patients themselves; and (5) a lack of financial, logistic, and social support for the participation of elderly patients in clinical trials. Recognizing this situation, NCI recently sponsored a number of trials that specifically target the elderly. This paper describes the status of all major Phase II and III clinical trials that recently were closed, still are active, or now are in review that address the clinical care of this important segment of the U.S. population.

evidence >> leftout >> nonresponse (10)

The Online Health Care Revolution: How the Web helps Americans take better care of themselves. Pew Internet & American Life Project. Accessed on 2003-06-12. "Fifty-two million American adults, or 55% of those with Internet access, have used the Web to get health or medical information. We call them “health seekers” and a majority of them go online at least once a month for health information. A great many health seekers say the resources they find on the Web have a direct effect on the decisions they make about their health care and on their interactions with doctors." www.pewinternet.org/reports/toc.asp?Report=26

Effect of UK national guidelines on services to treat patients with acute low back pain: follow up questionnaire survey. A. G. Barnett, M. R. Underwood, M. R. Vickers. British Medical Journal 1999: 318(7188); 919-20. (This study obtained survey response rates of 87% and 85%. Larger practices were overrepresented.) Abstract not available yet. [Full text] [PDF]

Imputing nonresponses to mail-back questionnaires. J. W. Drane. Am J Epidemiol 1991: 134(8); 908-12. Many mail-back questionnaires are expected at the outset to elicit poor response rates, perhaps as low as 15-30%. Corrections can be designed into such a survey by using either two or three mailouts of the questionnaire at regular intervals. Assuming a trend in responses as a function of the number of mailouts a person receives before filling out and mailing back the questionnaire, responses are imputed for those who do not mail back the questionnaire after the final mailout. Standard errors are derived, and an example is included. The imputation is easily programmed. A validation of this method is also included.

Non-response bias in a lifestyle survey. A. Hill, J. Roberts, P. Ewings, D. Gunnell. J Public Health Med 1997: 19(2); p203-7. BACKGROUND: Monitoring health targets is often undertaken using questionnaire surveys of lifestyle risk factors. Non-response bias is recognized but rarely quantified. METHODS: Following a questionnaire survey on a random sample of 6009 residents of Somerset with a response rate of 57.6 per cent, a telephone survey was undertaken on a random sample of 400 non-responders. A small number of the more important questions from the questionnaire were put to the non-responders over the phone. RESULTS: Fifty-nine per cent of the sample were contacted and agreed to participate. Statistically significant differences between responders and non-responders to the original questionnaire were detected for current smoking, hazardous alcohol consumption and lack of moderate or vigorous activity. CONCLUSIONS: Lifestyle questionnaire surveys need to include an assessment of the non-response bias.

A comparison on nonresponse in mail, telephone, and face-to-face surveys. J. J. Hox, D De Leeuw. Quality and Quantity 1994: 28(4); 329-344.

Do safety practices differ between responders and non-responders to a safety questionnaire? D. Kendrick, R. Hapgood, P. Marsh. Injury Prevention 2001: 7(2); 100-3. OBJECTIVE: To compare reported safety practices between responders and non-responders to a safety survey. DESIGN: Cross sectional survey at baseline compared with safety practices reported at subsequent child health surveillance checks. SUBJECTS: Parents of children aged 3-12 months registered with practices participating in a controlled trial of injury prevention in primary care that did, and did not, respond to the baseline survey and who subsequently attended child health surveillance checks. RESULTS: No difference in safety practices was found between responders and non-responders to the survey at the 6-9 month check. Responders were more likely to report owning a stair gate (odds ratio (OR) 2.75, 95% confidence interval (CI) 1.82 to 4.16) and socket covers (OR 2.16, 95% CI 1.53 to 3.04) at the 12-15 month check, and owning socket covers (OR 2.19, 95% CI 1.34 to 3.61) at the 18-24 month check. Responders were more likely to report greater than the median number of safety practices at the 18 month check. CONCLUSIONS: Non-responders to a safety survey appear to be less likely to report owning several items of safety equipment than responders. Further work is needed to confirm these findings. Extrapolating the results of safety surveys to the population as a whole may lead to over estimation of safety equipment possession. [Medline] [Abstract] [Full text] [PDF]

Quality of response in different population groups in mail and telephone surveys. J. Siemiatycki, S. Campbell, L. Richardson, D. Aubert. Am J Epidemiol 1984: 120(2); p302-14. Mail and telephone survey methods, with follow-up by other methods, can provide high response rates. However, it is not clear whether different population groups provide responses of different quality, thus creating risk of biased comparisons. A closely related problem is whether proxy response adequately substitutes for self-response. This study addressed these issues in the context of parallel mail and telephone health surveys carried out in Montreal. In the telephone survey, proxy respondents provided lower estimates of morbidity and health care utilization than self-respondents; in the mail survey, there was no difference between proxy and self-response. Response validity was assessed by comparing reported physician visits with those recorded by the government-run universal health insurance plan. In general, mail responses were more valid than telephone responses. In both methods, there were suggestive but not persuasive differences in validity among sociodemographic subgroups. In both methods, those reporting illness or medication use had less underreporting of physician visits than those not reporting such things.

Nonresponse bias and early versus all responders in mail and telephone surveys. J. Siemiatycki, S. Campbell. Am J Epidemiol 1984: 120(2); p291-301. Mail and telephone survey methods, with or without follow-up by other methods, are cost-effective alternatives to the conventional home interview approach. However, it has long been thought that they are especially susceptible to nonresponse bias. The study addressed this issue in the context of parallel mail and telephone health surveys carried out in Montreal. The mail strategy among 1,555 adults achieved 68.5% response and follow-up by telephone and home interview increased response to 80.9%. Respondents were adequately representative of the entire sample with respect to socioeconomic status, number of adults in household, and ethnic distribution. The 68.5% initial stage respondents were similar to all respondents on the above variables as well as on age, sex, education and reported health status. Odds ratios of smoking and respiratory symptoms hardly differed between initial stage and all respondents. The telephone survey among 1,595 adults achieved 72.7% response and follow-up by mail and personal interview increased response to 88.2%. Comparisons between respondents and the entire sample and between initial stage respondents and all respondents gave similar results to those found in the mail strategy, although there was some change in a symptom-smoking odds ratio from the initial stage respondents to all respondents. In both survey strategies, there was no evidence of substantial nonresponse bias and estimates of morbidity and health care would not have differed much if the fieldwork had stopped at the initial mail or telephone stage.

What are the characteristics of general practitioners who routinely do not return postal questionnaires: a cross sectional study. N. Stocks, D. Gunnell. J Epidemiol Community Health 2000: 54(12); p940-1. Abstract not available.

Representativeness and response rates from the Domestic/International Gastroenterology Surveillance Study (DIGEST). J. G. Tijssen. Scand J Gastroenterol Suppl 1999: 23115-9. BACKGROUND: The Domestic/international Gastroenterology Surveillance Study (DIGEST) examined the prevalence of upper gastrointestinal symptoms among the general population in 10 countries, and the impact of these symptoms on healthcare usage and quality of life. This report discusses the validation of the DIGEST sample and reviews the response rates from the survey. METHODS: External validation of the DIGEST sample was conducted by comparing the age, age by gender and annual household incomes of the sample with census-derived data. A comparison was also made between Psychological General Well-Being Index (PGWBI) scores from study subjects in the Scandinavian countries and the USA and the total sample population norms. RESULTS: Under- and oversampling, defined as > or =5% difference from the population norms, was evident in eight out of 10 countries, but no systematic bias was evident. The final distribution of the sample by gender was 51% female and 49% male. Although differences in PGWBI scores were noted between DIGEST subjects and population norms, these differences were <0.30 standard deviations--markedly below the difference considered as relevant for the PGWBI. Response for the survey in individual countries ranged from 17% in the USA to 61% in Norway, with a survey-wide rate of 27%. The overall response rate, including primary non-respondents, was 13.4%. The majority of nonresponse (51.4%) was attributed to failure to establish contact with the subjects, with 41.7% of subjects declining to be interviewed and the remaining 6.9% of subjects not meeting the age and sex criteria used for the survey. CONCLUSIONS: The DIGEST sample exhibited good external validity, providing a foundation for comparison between data derived from individual countries in the survey.

evidence >> leftout >> outliers (2)

Some Remarks on Wild Observations. William H. Kruskal. Accessed on 2002-11-27. "The purpose of these remarks is to set down some non-technical thoughts on apparently wild or outlying observations. These thoughts are by no means novel, but do not seem to have been gathered in one convenient place." www.tufts.edu/~gdallal/out.htm

Ozone Depletion, History and politics. Brien Sparling. Accessed on 2002-11-27. "Ground based measurements of Ozone were first started in 1956, in at Halley Bay, Antarctica. Satellite measurements of ozone started in the early 70's, but the first comprehensive worldwide measurements started in 1978 with the Nimbus-7 satellite. Nimbus-7 carried a TOMS (total ozone mapping spectrometer, and a SBUV(solar backscatter UV meter). The TOMS finally broke on May 7th,1993, but today there are several different satellites measuring concentrations of ozone and other atmosheric gases. Gases in the troposphere and lower stratosphere are sampled by weather balloons or by airplanes such as the ER-2 managed by NASA." www.nas.nasa.gov/About/Education/Ozone/history.html

evidence >> leftout >> refusals (2)

Characteristics of non-responders and the impact of non-response on prevalence estimates of dementia. F. Boersma, J. A. Eefsting, W. van den Brink, W. van Tilburg. International Journal of Epidemiology 1997: 26(5); 1055-62. BACKGROUND: Differential distributions of sociodemographic characteristics and cognitive impairment in responders and non-responders may result in a biased prevalence estimate of dementia based on responders only. METHODS: Responders (n = 2191) to a cross-sectional, two-stage community study were compared with regard to sociodemographic characteristics and cognition with three subgroups of non-responders: (A) subjects who refused to participate (n = 369), (B) subjects who were too ill or who had died prior to the screening (n = 72) and (C) subjects who had moved out of the study region or were not traceable (n = 23). Prevalence estimates specific for age and housing situation in responders and physicians' ratings of cognitive impairment were used to estimate the prevalence of dementia among non-responders. RESULTS: Group A differed from responders in age and housing situation, group B in age, housing and cognition, and group C only in age. Separate prevalence estimates of dementia based on age, housing and cognition yielded figures for group A between 4.9% and 7.2%, for group B between 13.1% and 19.1%, and for group C between 2.6% and 4.2%. Joined with the prevalence rate among responders (6.5%) the best possible point estimate of the prevalence of dementia in the target population lies between 6.4% and 6.9%, i.e. within the 95% confidence interval (CI) of the prevalence among responders (5.4-7.5%). CONCLUSIONS: Although in this study non-response had no important influence on the overall prevalence, the findings among the distinct non-response subgroups point to the importance of describing non-response sociodemographically as well as in terms of the study objective. The authors recommend that non-responders are categorized into distinct groups based on the reason for non-response.

Quality improvement report: Improving design and conduct of randomised trials by embedding them in qualitative research: ProtecT (prostate testing for cancer and treatment) study. Commentary: presenting unbiased information to patients can be difficult. J. Donovan, N. Mills, M. Smith, L. Brindle, A. Jacoby, T. Peters, S. Frankel, D. Neal, F. Hamdy. Bmj 2002: 325(7367); 766-70. PROBLEM: Recruitment to randomised trials is often difficult, and many important trials are not mounted because recruitment is thought to be "impossible." DESIGN: Controversial ProtecT (prostate testing for cancer and treatment) trial embedded within qualitative research. BACKGROUND AND SETTING: Screening for prostate cancer is hotly debated, and evidence from trials about the effectiveness of treatments (surgery, radiotherapy, and monitoring) is lacking. Mounting a treatment trial is controversial because of past failures and concerns that differences in complications of treatment but not survival make randomisation unacceptable to patients and clinicians, particularly for a trial including monitoring. STRATEGY FOR CHANGE: In-depth interviews explored interpretation of study information. Audiotape recordings of recruitment appointments enabled scrutiny of content and presentation of study information by recruiters. Initial qualitative findings showed that recruiters had difficulty discussing equipoise and presenting treatments equally; they unknowingly used terminology that was misinterpreted by participants. Findings were used to determine changes to content and presentation of information. EFFECTS OF CHANGE: Changes to the order of presenting treatments encouraged emphasis on equivalence, misinterpreted terms were avoided, the non-radical arm was redefined, and randomisation and clinical equipoise were presented more convincingly. The randomisation rate increased from 40% to 70%, all treatments became acceptable, and the three arm trial became the preferred design. LESSONS LEARNT: Changes to information and presentation resulted in efficient recruitment acceptable to patients and clinicians. Embedding this controversial trial within qualitative research improved recruitment. Such methods probably have wider applicability and may enable even the most difficult evaluative questions to be tackled.

evidence >> leftout >> volunteer (6)

A genetic bias in clinical trials? Cytochrome P450-2D6 (CYP2D6) genotype in general vs selected healthy subject populations [letter]. S. Chen, S. Kumar, W. H. Chou, J. S. Barrett, P. J. Wedlund. Br J Clin Pharmacol 1997: 44(3); 303-4. Abstract not available.

Selection bias in observational and experimental studies. J. H. Ellenberg. Stat Med 1994: 13(5-7); 557-67. There has been a heightened awareness of the dangers of selection bias over the past two decades. Certainly coverage in statistical and 'statistics for medicine', and epidemiology textbooks have allocated pages to warn investigators and readers of investigations to be aware of its presence. The scientific community has not, however, yet accepted the necessity for critical assessment of the method of sample selection in the planning and execution of studies as a fundamental underpinning of observational and experimental studies. To wit, we are faced with a plethora of research studies receiving funding, being published in peer-reviewed journals and influencing future studies, that may be reporting entirely spurious associations. It is the intent of this paper to present examples of selection bias in a variety of areas which have resulted in misleading or entirely incorrect results. We hope to help make such research scientifically 'politically incorrect' to the degree that the scientific community 'just says no' to such studies, either proposed or reported.

A comparison of cigarette smokers recruited through the Internet or by mail. J. F. Etter, T. V. Perneger. Int J Epidemiol 2001: 30(3); 521-5. OBJECTIVES: To compare smokers recruited by mail or through the Internet. METHODS: A questionnaire was mailed to 19,352 inhabitants of Switzerland in 1998, in an effort to enroll them in a smoking cessation trial. The same questionnaire was also available on the Internet. Furthermore, we mailed a survey to a representative sample (n = 1000) of the population of Geneva, Switzerland, in 1996. In this study, we compare three groups: 1027 smokers recruited through the Internet, 2961 volunteer trial participants recruited by mail (response rate 16%), and 211 smokers in the representative sample also recruited by mail (response rate 75%). RESULTS: Smokers self-recruited through the Internet were younger, more educated, more motivated to quit smoking and smoked more cigarettes per day than smokers in the other samples. Compared to trial participants, Internet participants had more negative attitudes towards smoking, higher self-efficacy scores, and were more addicted to tobacco. The strength of associations between smoking-related variables was similar in Internet and trial participants. CONCLUSION: As expected, the three groups of smokers differed on several characteristics. However, bias in distributions of variables did not imply bias in associations between variables. Thus, Internet recruitment is a potentially useful method for analytical studies that focus on associations between variables.

Uptake of screening and prevention in women at very high risk of breast cancer. D. Evans, F. Lalloo, A. Shenton, C. Boggis, A. Howell. Lancet 2001: 358(9285); 889-90. Management of women at high lifetime risk of familial breast cancer is hampered because of limited data concerning the appropriateness of treatment options. Over the past 8 years women at very high (>40%) lifetime risk of breast cancer have had the option of entering two chemoprevention treatment trials, a magnetic resonance imaging (MRI) breast screening study, or a risk-reducing mastectomy (RRM) study. Only 10% of eligible women have entered one of the chemotherapy trials with a similar proportion opting for RRM (>50% in mutation carriers) compared with 60% opting for MRI screening. Future chemotherapy trials will have to be designed to address this poor recruitment.

The healthy control subject in psychiatric research: impulsiveness and volunteer bias. J. P. Gustavsson, M. Asberg, D. Schalling. Acta Psychiatr Scand 1997: 96(5); 325-8. Exciting and demanding biomedical experiments may attract a specific subgroup of people as volunteers. In the present study of selection bias, subjects volunteering in a psychobiological study that included a potentially painful procedure (lumbar puncture) were compared with those who declined to participate, with regard to scores on personality scales administered during a previous investigation of the same subjects. Significant differences were found on the Eysenck Personality Questionnaire and Karolinska Scales of Personality Impulsiveness scale, suggesting an over-representation of impulsive individuals among the volunteers. If the specific subject of investigation has implications for the type of individual who will participate as a healthy volunteer in biomedical research, variation will be introduced, affecting the independent variable, and the conclusions that can be drawn from such research may be questionable.

Are Subjects in Pharmacological Treatment Trials of Depression Representative of Patients in Routine Clincal Practice. M. Zimmerman, J.I. Mattia, Michael A. Posternak. American Journal of Psychiatry 2002: 159(3); 469-473. (A nice overview appears in the Lancet at http://www.thelancet.com/journal/vol359/iss9308/full/llan.359.9308.news.20219.2) OBJECTIVE: The methods used to evaluate the efficacy of antidepressants differ from treatment for depression in routine clinical practice. The rigorous inclusion/exclusion criteria used to select subjects for participation in efficacy studies potentially limit the generalizability of these trials' results. It is unknown how much impact these criteria have on the representativeness of subjects in efficacy trials. This study estimated the proportion of depressed patients treated in routine clinical practice who would meet standard inclusion/exclusion criteria for an efficacy trial. METHOD: A total of 803 individuals, aged 16--65 years, who were seen at intake at an outpatient practice underwent a thorough diagnostic evaluation, including the administration of semistructured diagnostic interviews; 346 patients had current major depression. Common inclusion/exclusion criteria used in efficacy studies of antidepressants were applied to the depressed patients to determine how many would have qualified for an efficacy trial. RESULTS: Approximately one-sixth of the 346 depressed patients would have been excluded from an efficacy trial because they had a bipolar or psychotic subtype of depression. The presence of a comorbid anxiety or substance use disorder, insufficient severity of depressive symptoms, or current suicidal ideation would have excluded 86.0% (N=252) of the remaining 293 outpatients with nonpsychotic unipolar major depressive disorder from an antidepressant efficacy trial. CONCLUSIONS: Subjects treated in antidepressant trials represent a minority of patients treated for major depression in routine clinical practice. These results show that antidepressant efficacy trials tend to evaluate a subset of depressed individuals with a specific clinical profile.

evidence >> misc >> abstracts (3)

The accuracy of abstracts in psychology journals. A. H. Harris, S. Standard, J. L. Brunning, S. L. Casey, J. H. Goldberg, L. Oliver, K. Ito, J. M. Marshall. J Psychol 2002: 136(2); 141-8. This article provides an empirically supported reminder of the importance of accuracy in scientific communication. The authors identify common types of inaccuracies in research abstracts and offer suggestions to improve abstract-article agreement. Abstracts accompanying 13% of a random sample of 400 research articles published in 8 American Psychological Association journals during 1997 and 1998 contained data or claims inconsistent with or missing from the body of the article. Error rates ranged from 8% to 18%, although between-journal differences were not significant. Many errors (63%) were unlikely to cause substantive misinterpretations. Unfortunately, 37% of errors found could be seriously misleading with respect to the data or claims presented in the associated article. Although deficient abstracts may be less common in psychology journals than in major medical journals (R. M. Pitkin, M. A. Branagan, & L. F. Burmeister, 1999), there is still cause for concern and need for improvement.

Accuracy of data in abstracts of published research articles. R. M. Pitkin, M. A. Branagan, L. F. Burmeister. Jama 1999: 281(12); 1110-1. CONTEXT: The section of a research article most likely to be read is the abstract, and therefore it is particularly important that the abstract reflect the article faithfully. OBJECTIVE: To assess abstracts accompanying research articles published in 6 medical journals with respect to whether data in the abstract could be verified in the article itself. DESIGN: Analysis of simple random samples of 44 articles and their accompanying abstracts published during 1 year(July 1, 1996-June 30, 1997) in each of 5 major general medical journals (Annals of Internal Medicine, BMJ, JAMA, Lancet, and New England Journal of Medicine) and a consecutive sample of 44 articles published during 15 months (July 1, 1996-August 15, 1997) in the CMAJ. MAIN OUTCOME MEASURE: Abstracts were considered deficient if they contained data that were either inconsistent with corresponding data in the article's body (including tables and figures) or not found in the body at all. RESULTS: The proportion of deficient abstracts varied widely (18%-68%) and to a statistically significant degree (P<.001) among the 6 journals studied. CONCLUSIONS: Data in the abstract that are inconsistent with or absent from the article's body are common, even in large-circulation general medical journals.

Can the accuracy of abstracts be improved by providing specific instructions? A randomized controlled trial. R. M. Pitkin, M. A. Branagan. Jama 1998: 280(3); 267-9. CONTEXT: The most-read section of a research article is the abstract, and therefore it is especially important that the abstract be accurate. OBJECTIVE: To test the hypothesis that providing authors with specific instructions about abstract accuracy will result in improved accuracy. DESIGN: Randomized controlled trial of an educational intervention specifying 3 types of common defects in abstracts of articles that had been reviewed and were being returned to the authors with an invitation to revise. MEAN OUTCOME MEASURE: Proportion of abstracts containing 1 or more of the following defects: inconsistency in data between abstract and body of manuscript (text, tables, and figures), data or other information given in abstract but not in body, and/or conclusions not justified by information in the abstract. RESULTS: Of 250 manuscripts randomized, 13 were never revised and 34 were lost to follow-up, leaving a final comparison between 89 in the intervention group and 114 in the control group. Abstracts were defective in 25 (28%) and 30 (26%) cases, respectively (P=.78). Among 55 defective abstracts, 28 (51%) had inconsistencies, 16 (29%) contained data not present in the body, 8 (15%) had both types of defects, and 3 (5%) contained unjustified conclusions. CONCLUSIONS: Defects in abstracts, particularly inconsistencies between abstract and body and the presentation of data in abstract but not in body, occur frequently. Specific instructions to authors who are revising their manuscripts are ineffective in lowering this rate. Journals should include in their editing processes specific and detailed attention to abstracts.

evidence >> misc >> analysis (1)

Pitfalls of pharmacoepidemiology. D. C. Skegg. Bmj 2000: 321(7270); p1171-2. Abstract not available yet. [Full text] [PDF]

evidence >> misc >> conflict (15)

A Frank Statement to Cigarette Smokers. Tobacco Industry Research Committee. Accessed on 2003-06-18. "Recent reports on experiments with mice have given wide publicity to a theory that cigarette smoking is in some way linked with lung cancer in human beings." www.pmdocs.com/getimg.asp?pgno=0&start=0&bool=Frank%20Statement&docid=2015002376

Conflict of interest and the American Journal of Bioethics. K. A. Carroll, G. McGee. American Journal of Bioethics 2002: 2(3); 1-2. [Medline]

Unconventional cancer therapies: What we need is rigorous research, not closed minds. E. Ernst. Chest 2000: 117(2); 307-8. [Full text] [PDF]

Reference bias in reports of drug trials. P. C. Gotzsche. Br Med J (Clin Res Ed) 1987: 295(6599); 654-6. Articles published before 1985 describing double blind trials of two or more non-steroidal anti-inflammatory drugs in rheumatoid arthritis were examined to see whether there was any bias in the references they cited. Althogether 244 articles meeting the criteria were found through a Medline search and through examining the reference lists of the articles retrieved. The drugs compared in the studies were classified as new or as control drugs and the outcome of the trial as positive or not positive. The reference lists of all papers with references to other trials on the new drug were then examined for reference bias. Positive bias was judged to have occurred if the reference list contained a higher proportion of references with a positive outcome for that drug than among all the articles assumed to have been available to the authors (those published more than two years earlier than the index article). Altogether 133 of the 244 articles were excluded for various reasons--for example, 44 because of multiple publication and 19 because they had no references. Among the 111 articles analysed bias was not possible in the references of 35 (because all the references gave the same outcome); 10 had a neutral selection of references, 22 a negative selection, and 44 a positive selection--a significant positive bias. This bias was not caused by better scientific standing of the cited articles over the uncited ones. Thus retrieving literature by scanning reference lists may produce a biased sample of articles, and reference bias may also render the conclusions of an article less reliable.

Declaring financial competing interests: survey of five general medical journals. A. Hussain, R. Smith. British Medical Journal 2001: 323(7307); p263-4. Abstract not available. [Medline] [Full text] [PDF]

Association between competing interests and authors' conclusions: epidemiological study of randomised clinical trials published in the BMJ. L. L. Kjaergard, B. Als-Nielsen. British Medical Journal 2002: 325(7358); 249. Objective: To assess the association between competing interests and authors' conclusions in randomised clinical trials. Design: Epidemiological study of randomised clinical trials published in the BMJ from January 1997 to June 2001. Financial competing interests were defined as funding by for profit organisations and other competing interests as personal, academic, or political. Studies: 159 trials from 12 medical specialties. Main outcome measures: Authors' conclusions defined as interpretation of extent to which overall results favoured experimental intervention. Conclusions appraised on 6 point scale; higher scores favour experimental intervention. Results: Authors' conclusions were significantly more positive towards the experimental intervention in trials funded by for profit organisations alone compared with trials without competing interests (mean difference 0.48 (SE 0.13), P=0.014), trials funded by both for profit and non-profit organisations (0.30 (SE 0.10), P=0.003), and trials with other competing interests (0.45 (SE 0.13), P=0.006). Other competing interests and funding from both for profit and non-profit organisations were not significantly associated with authors' conclusions. The association between financial competing interests and authors' conclusions was not explained by methodological quality, statistical power, type of experimental intervention (pharmacological or non-pharmacological), type of control intervention (for example, placebo or active drug), or medical specialty. Conclusions: Authors' conclusions in randomised clinical trials significantly favoured experimental interventions if financial competing interests were declared. Other competing interests were not significantly associated with authors' conclusions.

When statistics provide unsatisfying answers: revisiting the breast self-examination controversy. B. H. Lerner. Cmaj 2002: 166(2); 199-201.

Nonfinancial conflicts of interest in research. N. G. Levinsky. N Engl J Med 2002: 347(10); 759-61. Abstract not available yet. [Medline] [Abstract]

Academic freedom in clinical research. D. G. Nathan, D. J. Weatherall. New England Journal of Medicine 2002: 347(17); 1368-71. [Medline] [Abstract]

Cholesterol lowering trials in coronary heart disease: frequency of citation and outcome. U. Ravnskov. British Journal of Medicine 1992: 305(6844); 15-19. ABSTRACT: OBJECTIVE--To see if the claim that lowering cholesterol values prevents coronary heart disease is true or if it is based on citation of supportive trials only. DESIGN--Comparison of frequency of citation with outcome of all controlled cholesterol lowering trials using coronary heart disease or death, or both, as end point. SUBJECTS--22 controlled cholesterol lowering trials. RESULTS--Trials considered by their directors as supportive of the contention were cited almost six times more often than others, according to Science Citation Index. Apart from trials discontinued because of alleged side effects of treatment, unsupportive trials were not cited after 1970, although their number almost equalled the number considered supportive. In three supportive reviews the outcome of the selected trials was more favourable than the outcome of the excluded and ignored trials. In the 22 controlled cholesterol lowering trials studied total and coronary heart disease mortality was not changed significantly either overall or in any subgroup. A statistically significant 0.32% reduction in non-fatal coronary heart disease seemed to be due to bias as event frequencies were unrelated to trial length and to mean net reduction in cholesterol value; individual changes in cholesterol values were unsystematically or not related to outcome; and after correction for a small but significant increase in non-medical deaths in the intervention groups total mortality remained unchanged (odds ratio 1.02). CONCLUSIONS--Lowering serum cholesterol concentrations does not reduce mortality and is unlikely to prevent coronary heart disease. Claims of the opposite are based on preferential citation of supportive trials.

Bias in analytic research. D. L. Sackett. J Chronic Dis 1979: 32(1-2); 51-63.

Beyond conflict of interest. Transparency is the key [editorial]. R. Smith. Bmj 1998: 317(7154); 291-2. [Full text] [PDF]

Conflict of interest in the debate over calcium-channel antagonists. H. T. Stelfox, G. Chua, O. Rourke K, A. S. Detsky. N Engl J Med 1998: 338(2); 101-6. BACKGROUND: Physicians' financial relationships with the pharmaceutical industry are controversial because such relationships may pose a conflict of interest. It is unknown to what extent industry support of medical education and research influences the opinions and behavior of clinicians and researchers. The recent debate over the safety of calcium-channel antagonists provided an opportunity to examine the effect of financial conflicts of interest. METHODS: We searched the English-language medical literature published from March 1995 through September 1996 for articles examining the controversy about the safety of calcium-channel antagonists. Articles were reviewed and classified as being supportive, neutral, or critical with respect to the use of calcium-channel antagonists. The authors of the articles were asked about their financial relationships with both manufacturers of calcium-channel antagonists and manufacturers of competing products (i.e., beta-blockers, angiotensin-converting-enzyme inhibitors, diuretics, and nitrates). We examined the authors' published positions on the safety of calcium-channel antagonists according to their financial relationships with pharmaceutical companies. RESULTS: Authors who supported the use of calcium-channel antagonists were significantly more likely than neutral or critical authors to have financial relationships with manufacturers of calcium-channel antagonists (96 percent, vs. 60 percent and 37 percent, respectively; P<0.001). Supportive authors were also more likely than neutral or critical authors to have financial relationships with any pharmaceutical manufacturer, irrespective of the product (100 percent, vs. 67 percent and 43 percent, respectively; P< 0.001). CONCLUSIONS: Our results demonstrate a strong association between authors' published positions on the safety of calcium-channel antagonists and their financial relationships with pharmaceutical manufacturers. The medical profession needs to develop a more effective policy on conflict of interest. We support complete disclosure of relationships with pharmaceutical manufacturers for clinicians and researchers who write articles examining pharmaceutical products.

The uncertainty principle and industry-sponsored research. B. Djulbegovic, M. Lacevic, A. Cantor, K. K. Fields, C. L. Bennett, J. R. Adams, N. M. Kuderer, G. H. Lyman. Lancet 2000: 356(9230); 635-8. BACKGROUND: Reporting of pharmaceutical-industry-sponsored randomised clinical trials often result in biased findings, either due to selective reporting of studies with non-equivalent arms or publication of low-quality papers, wherein unfavourable results are incompletely described. A randomised trial should be conducted only if there is substantial uncertainty about the relative value of one treatment versus another. Studies in which intervention and control are thought to be non-equivalent violates the uncertainty principle. METHODS: We examined the quality of 136 published randomised trials that focused on one disease category (multiple myeloma) and adherence to the uncertainty principle. To evaluate whether the uncertainty principle was upheld, we compared the number of studies favouring experimental treatments over standard ones. We analysed data according to the source of funding. FINDINGS: Trials funded solely or in part by 35 profit-making organisations had a trend toward higher quality scores (mean 2.94 [SD 1.3]; median 3) than randomised trials supported by 95 governmental or other non-profit organisations (2.4 [0.8]; 2; p=0.06). Overall, the uncertainty principle was upheld, with 44% of randomised trials favouring standard treatments and 56% innovative treatments (p=0.17); mean and median preference evaluation scores were 3.7 (1.0) and 4. However, when the analysis was done according to the source of funding, studies funded by non-profit organisations maintained equipoise favouring new therapies over standard ones (47% vs 53%; p=0.608) to a greater extent than randomised trials supported solely or in part by profit-making organisations (74% vs 26%; p=0.004). INTERPRETATION: The reported bias in research sponsored by the pharmaceutical industry may be a consequence of violations of the uncertainty principle. Sponsors of clinical trials should be encouraged to report all results and to choose appropriate comparative controls.

Why review articles on the health effects of passive smoking reach different conclusions. D. E. Barnes, L. A. Bero. Jama 1998: 279(19); 1566-70. OBJECTIVE: To determine whether the conclusions of review articles on the health effects of passive smoking are associated with article quality, the affiliations of their authors, or other article characteristics. DATA SOURCES: Review articles published from 1980 to 1995 were identified through electronic searches of MEDLINE and EMBASE and from a database of symposium proceedings on passive smoking. ARTICLE SELECTION: An article was included if its stated or implied purpose was to review the scientific evidence that passive smoking is associated with 1 or more health outcomes. Articles were excluded if they did not focus specifically on the health effects of passive smoking or if they were not written in English. DATA EXTRACTION: Review article quality was evaluated by 2 independent assessors who were trained, followed a written protocol, had no disclosed conflicts of interest, and were blinded to all study hypotheses and identifying characteristics of articles. Article conclusions were categorized by the 2 assessors and by one of the authors. Author affiliation was classified as either tobacco industry affiliated or not, based on whether the authors were known to have received funding from or participated in activities sponsored by the tobacco industry. Other article characteristics were classified by one of the authors using predefined criteria. DATA SYNTHESIS: A total of 106 reviews were identified. Overall, 37% (39/106) of reviews concluded that passive smoking is not harmful to health; 74% (29/39) of these were written by authors with tobacco industry affiliations. In multiple logistic regression analyses controlling for article quality, peer review status, article topic, and year of publication, the only factor associated with concluding that passive smoking is not harmful was whether an author was affiliated with the tobacco industry (odds ratio, 88.4; 95% confidence interval, 16.4-476.5; P<.001). CONCLUSIONS: The conclusions of review articles are strongly associated with the affiliations of their authors. Authors of review articles should disclose potential financial conflicts of interest, and readers of review articles should consider authors' affiliations when deciding how to judge an article's conclusions. [Medline]

evidence >> misc >> exercise (4)

Postmarketing surveillance study of a non-chlorofluorocarbon inhaler according to the safety assessment of marketed medicines guidelines. J. G. Ayres, C. D. Frost, W. F. Holmes, D. R. Williams, S. M. Ward. British Medical Journal 1998: 317(7163); 926-30. OBJECTIVE: To evaluate the safety of a non-chlorofluorocarbon metered dose salbutamol inhaler. DESIGN: This was a postmarketing surveillance study, conducted under formal guidelines for company sponsored safety assessment of marketed medicines (SAMM). A non-randomised, non-interventional, observational design compared patients prescribed metered doses of salbutamol delivered by inhalers using either hydrofluoroalkane or chlorofluorocarbon as the propellant. Follow up was three months. SETTING: 646 general practices throughout the United Kingdom. SUBJECTS: 6614 patients with obstructive airways disease (1667 patient years of exposure). MAIN OUTCOME MEASURES: Proportions of patients who were: admitted to hospital for respiratory diseases, reported adverse side effects, or withdrew because of adverse affects. RESULTS: There were no significant differences between the hydrofluoroalkane (HFA 134a) and chlorofluorocarbon inhaler groups in relation to the proportions of patients admitted to hospital for respiratory diseases (odds ratio 0.75; 95% confidence interval 0.51 to 1.08) or the proportions who reported adverse events (1.01; 0.88 to 1.17). However, more patients using the hydrofluoroalkane inhaler than the chlorofluorocarbon inhaler withdrew because of adverse events (3.8% and 0.9% respectively). CONCLUSION: The hydrofluoroalkane inhaler was as safe as the chlorofluorocarbon inhaler when judged by hospital admissions and adverse affects. The study design successfully fulfilled the recommendations of the guidelines. Differences between postmarketing surveillance studies and randomised clinical trials in assessing safety were identified. These may lead to difficulties in the design of postmarketing surveillance studies. [Medline] [Abstract] [Full text] [PDF]

Obstetric care and proneness of offspring to suicide as adults: case-control study. Bertil Jacobson, Marc Bygdeman. British Medical Journal 1998: 317(7169); 1346-1349. ABSTRACT: OBJECTIVE: To investigate any long term effects of traumatic birth and obstetric procedures in relation to suicide by violent means in offspring as adults. DESIGN: Prospective case-control study. SETTING: Stockholm, Sweden. SUBJECTS: 242 adults who committed suicide by violent means from 1978 to 1995, and who were born in one of seven hospitals in Stockholm during 1945-80, matched with 403 biological siblings born during the same period and at the same group of hospitals. MAIN OUTCOME MEASURES: Adverse and beneficial perinatal factors expressed as relative risks (odds ratios) and 95% confidence intervals, derived from logistic regression of cases matched with their siblings. RESULTS: For multiple birth trauma the estimated relative risks of offspring subsequently committing suicide by violent means were 4.9 (95% confidence interval 1.8 to 13) for men and 1.04 (0.2 to 4.6) for women. In mothers who received multiple opiate treatment during delivery, the estimated relative risk of offspring subsequently committing suicide was equal for both sexes (0.26, 0.09 to 0.69). CONCLUSION: Minimising pain and discomfort to the infant during birth seems to be of importance in reducing the risk of committing suicide by violent means as an adult. [Medline] [Abstract] [Full text] [PDF]

Midline episiotomy and anal incontinence: retrospective cohort study. Lisa B Signorello, Bernard L Harlow, Amy K Chekos, John T Repke. British Medical Journal 2000: 320(7227); 86-90. ABSTRACT: OBJECTIVE: To evaluate the relation between midline episiotomy and postpartum anal incontinence. DESIGN: Retrospective cohort study with three study arms and six months of follow up. SETTING: University teaching hospital. PARTICIPANTS: Primiparous women who vaginally delivered a live full term, singleton baby between 1 August 1996 and 8 February 1997: 209 who received an episiotomy; 206 who did not receive an episiotomy but experienced a second, third, or fourth degree spontaneous perineal laceration; and 211 who experienced either no laceration or a first degree perineal laceration. MAIN OUTCOME MEASURES: Self reported faecal and flatus incontinence at three and six months postpartum. RESULTS: Women who had episiotomies had a higher risk of faecal incontinence at three (odds ratio 5.5, 95% confidence interval 1.8 to 16.2) and six (3.7, 0.9 to 15.6) months postpartum compared with women with an intact perineum. Compared with women with a spontaneous laceration, episiotomy tripled the risk of faecal incontinence at three months (95% confidence interval 1.3 to 7.9) and six months (0.7 to 11.2) postpartum, and doubled the risk of flatus incontinence at three months (1.3 to 3.4) and six months (1.2 to 3.7) postpartum. A non-extending episiotomy (that is, second degree surgical incision) tripled the risk of faecal incontinence (1.1 to 9.0) and nearly doubled the risk of flatus incontinence (1.0 to 3.0) at three months postpartum compared with women who had a second degree spontaneous tear. The effect of episiotomy was independent of maternal age, infant birth weight, duration of second stage of labour, use of obstetric instrumentation during delivery, and complications of labour. CONCLUSIONS: Midline episiotomy is not effective in protecting the perineum and sphincters during childbirth and may impair anal continence. [Medline] [Abstract] [Full text] [PDF]

Use of ultramolecular potencies of allergen to treat asthmatic people allergic to house dust mite: double blind randomised controlled clinical trial. G T Lewith, A D Watkins, M E Hyland, S Shaw, J A Broomfield, G Dolan, S T Holgate. British Medical Journal 2002: 324(7336); 520-. Objective: To evaluate the efficacy of homoeopathic immunotherapy on lung function and respiratory symptoms in asthmatic people allergic to house dust mite. Design: Double blind randomised controlled trial. Setting: 38 general practices in Hampshire and Dorset. Participants: 242 people with asthma and positive results to skin prick test for house dust mite; 202 completed clinic based assessments, and 186 completed diary based assessments. Intervention: After a four week baseline assessment, participants were randomised to receive oral homoeopathic immunotherapy or placebo and then assessed over 16 weeks with three clinic visits and diary assessments every other week. Outcome measure: Clinic based assessments: forced expiratory volume in one second (FEV1), quality of life, and mood. Diary based assessments: morning and evening peak expiratory flow, visual analogue scale of severity of asthma, quality of life, and daily mood. Results: There was no difference in most outcomes between placebo and homoeopathic immunotherapy. There was a different pattern of change over the trial for three of the diary assessments: morning peak expiratory flow (P=0.025), visual analogue scale (P=0.017), and mood (P=0.035). At week three there was significant deterioration for visual analogue scale (P=0.047) and mood (P=0.013) in the homoeopathic immunotherapy group compared with the placebo group. Any improvement in participants' asthma was independent of belief in complementary medicine. Conclusion: Homoeopathic immunotherapy is not effective in the treatment of patients with asthma. The different patterns of change between homoeopathic immunotherapy and placebo over the course of the study are unexplained. [Abstract] [Full text] [PDF]

evidence >> misc >> guidelines (15)

Users' guides to the medical literature: II. How to use an article about therapy or prevention: A. Are the results of the study valid? GH Guyatt, DL Sackett, DJ Cook. JAMA 1993: 270(21); 2598-2601.

Users' guides to the medical literature: II. How to use an article about therapy or prevention: B. What are the results and will they help me in caring for my patients? GH Guyatt, DL Sackett, DJ Cook. JAMA 1994: 271(1); 59-63. Abstract not available.

Users' guides to the medical literature. IX. A method for grading health care recommendations. GH Guyatt, DL Sackett, JC Sinclair, RSA Hayward, DJ Cook, RJ Cook. JAMA 1995: 274(22); 1800-4. Abstract not available.

Transferring evidence from research into practice: 2. Getting the evidence straight. R. B. Haynes, D. L. Sackett, J. A. Gray, D. L. Cook, G. H. Guyatt. ACP Journal Club 1997: 126(1); A14-6. Abstract not available yet.

Transferring evidence from research into practice: 1. The role of clinical care research evidence in clinical decisions. R. B. Haynes, D. L. Sackett, J. M. Gray, D. J. Cook, G. H. Guyatt. ACP Journal Club 1996: 125(3); A14-6. Abstract not available yet.

Transferring evidence from research into practice: 4. Overcoming barriers to application. R. B. Haynes, D. L. Sackett, G. H. Guyatt, D. J. Cook, J. A. Gray. ACP Journal Club 1997: 126(3); A14-5. Abstract not available yet.

Users' guides to the medical literature. III. How to use an article about a diagnostic test. A. Are the results of the study valid? R Jaeschke, GH Guyatt, DL Sackett. JAMA 1994: 271(5); 389-91. Abstract not available.

Users' guides to the medical literature. III. How to use an article about a diagnostic test. B. What are the results and will they help me in caring for my patients? R Jaeschke, GH Guyatt, DL Sackett. JAMA 1994: 271(9); 703-7. Abstract not available.

Reporting, appraising, and integrating data on genotype prevalence and gene-disease associations. J. Little, L. Bradley, M. S. Bray, M. Clyne, J. Dorman, D. L. Ellsworth, J. Hanson, M. Khoury, J. Lau, T. R. O'Brien, N. Rothman, D. Stroup, E. Taioli, D. Thomas, H. Vainio, S. Wacholder, C. Weinberg. Am J Epidemiol 2002: 156(4); 300-10. The recent completion of the first draft of the human genome sequence and advances in technologies for genomic analysis are generating tremendous opportunities for epidemiologic studies to evaluate the role of genetic variants in human disease. Many methodological issues apply to the investigation of variation in the frequency of allelic variants of human genes, of the possibility that these influence disease risk, and of assessment of the magnitude of the associated risk. Based on a Human Genome Epidemiology workshop, a checklist for reporting and appraising studies of genotype prevalence and studies of gene-disease associations was developed. This focuses on selection of study subjects, analytic validity of genotyping, population stratification, and statistical issues. Use of the checklist should facilitate the integration of evidence from these studies. The relation between the checklist and grading schemes that have been proposed for the evaluation of observational studies is discussed. Although the limitations of grading schemes are recognized, a robust approach is proposed. Other issues in the synthesis of evidence that are particularly relevant to studies of genotype prevalence and gene-disease association are discussed, notably identification of studies, publication bias, criteria for causal inference, and the appropriateness of quantitative synthesis.

Users' Guides to the Medical Literature. Finlay A. McAlister, A Laupacis, G. A. Wells, D. L. Sackett. JAMA 1999: 282(14); 1371-1377. Abstract not available.

Users' guides to the medical literature XIX. Applying clinical trial results B. Guidelines for determining whether a drug is exerting (more than) a class effect. FA McAlister, A Laupacis, GA Wells, DL Sackett. JAMA 1999: 282(14); 1371-7. Abstract not available.

Transferring evidence from research into practice: 3. Developing evidence-based clinical policy. J. A. Muir Gray, R. B. Haynes, D. L. Sackett, D. J. Cook, G. H. Guyatt. ACP Journal Club 1997: 126(2); A14-6. Abstract not available yet.

Users' guides to the medical literature. I. How to get started. AD Oxman, DL Sackett, GH Guyatt. JAMA 1993: 270(17); 2093-5. Abstract not available.

Evidence-based Medicine How to Practice and Teach EBM. David L. Sackett, MD, Scott W. Richardson, William Rosenberg, Brian R. Haynes (1998) Edinburgh: Churchill Livingstone.

Why Sackett's analysis of randomized controlled trials fails, but needn't. Stanley H Shapiro, Kathleen Cranley Glass. CMAJ (Journal of the Canandian Medical Association) 2000: 163(7); 834-835. Abstract not available yet. [Full text] [PDF]

evidence >> misc >> overview (5)

Bias. Bandolier. Accessed on 2003-03-25. "Bandolier has been struck of late, 'many a time and oft', by the continuing and cavalier attitude towards bias in clinical trials. We know that the way that clinical trials are designed and conducted can influence their results. Yet people still ignore known sources of bias when making decisions about treatments at all levels." www.jr2.ox.ac.uk/bandolier/band80/b80-2.html

The Glossary of Mathematical Mistakes. Paul Cox. Accessed on 2003-06-10. "This is a list of mathematical mistakes made over and over by advertisers, the media, reporters, politicians, activists, and in general many non-math people. These come from many sources, which will appear in parenthesis. I will try to find an actual example of each for learning purposes." www.mathmistakes.com/

Where's the Evidence? Debates in Modern Medicine. William A. Silverman (1998) New York: Oxford University Press.

Design and analysis of prostate cancer trials. R. Sylvester. Acta Urologica Belgica 1994: 62(1); 23-29. ABSTRACT: This paper presents an overview of various statistical concepts related to the design and analysis of prostate cancer trials: the need for randomization, stratification for prognostic factors, sample size determination, trial objectives, the choice of a control group, patient entry criteria, the number of treatments to be compared, the choice of endpoints, analysis by the intent to treat principle, interim statistical analysis and early stopping rule, and subgroup analyses.

Content and quality of 2000 controlled trials in schizophrenia over 50 years. Ben Thornley, C Adams. British Medical Journal 1998: 317(7167); 1181-1184. ABSTRACT: OBJECTIVE: To provide a comprehensive survey of the content and quality of intervention studies relevant to the treatment of schizophrenia. DESIGN:Data were extracted from 2000 trials on the Cochrane Schizophrenia Group's register. MAIN OUTCOME MEASURES: Type and date of publication, country of origin, language, size of study, treatment setting, participant group, interventions, outcomes, and quality of study. RESULTS: Hospital based drug trials undertaken in the United States were dominant in the sample (54%). Generally, studies were short (54%<6 weeks), small (mean number of patients 65), and poorly reported (64% had a quality score of <=2 (maximum score 5)). Over 600 different interventions were studied in these trials, and 640 different rating scales were used to measure outcome. CONCLUSIONS: Half a century of studies of limited quality, duration, and clinical utility leave much scope for well planned, conducted, and reported trials. The drug regulatory authorities should stipulate that the results of both explanatory and pragmatic trials are necessary before a compound is given a licence for everyday use. [Abstract] [Full text] [PDF]

evidence >> misc >> quality (14)

Poor-quality medical research: what can journals do? D. G. Altman. Jama 2002: 287(21); 2765-7. The aim of medical research is to advance scientific knowledge and hence--directly or indirectly--lead to improvements in the treatment and prevention of disease. Each research project should continue systematically from previous research and feed into future research. Each project should contribute beneficially to a slowly evolving body of research. A study should not mislead; otherwise it could adversely affect clinical practice and future research. In 1994 I observed that research papers commonly contain methodological errors, report results selectively, and draw unjustified conclusions. Here I revisit the topic and suggest how journal editors can help.

Improving the quality of reporting of randomized controlled trials. The CONSORT statement. C. Begg, M. Cho, S. Eastwood, R. Horton, D. Moher, I. Olkin, R. Pitkin, D. Rennie, K. F. Schulz, D. Simel, D. F. Stroup. Jama 1996: 276(8); 637-9. Abstract not available yet.

Reviewing the reviewers: the quality of reporting in three secondary journals. P. J. Devereaux, B. J. Manns, W. A. Ghali, H. Quan, G. H. Guyatt. Cmaj 2001: 164(11); 1573-6. BACKGROUND: Secondary journals such as ACP Journal Club (ACP), Journal Watch (JW) and Internal Medicine Alert (IMA) have enormous potential to help clinicians remain up to date with medical knowledge. However, for clinicians to evaluate the validity and applicability of new findings, they need information on the study design, methodology and results. METHODS: Beginning with the first issue in March 1997, we selected 50 consecutive summaries of studies addressing therapy or prevention and internal medicine content from each of the ACP, JW and IMA. We evaluated the summaries for completeness of reporting key aspects of study design, methodology and results. RESULTS: All of the summaries in ACP reported study design, as compared with 72% of the summaries in JW and IMA (p < 0.001). In summaries of randomized controlled trials the 3 secondary journals were similar in reporting concealment of patient allocation (none reported this), blinding status of participants (ACP 62%, JW 70% and IMA 70% [p = 0.7]), blinding status of health care providers (ACP 12%, JW 4% and IMA 4% [p = 0.4]) and blinding status of judicial assessors of outcomes (ACP 4%, JW 4% and IMA 0% [p = 0.4]). ACP was the only one to report whether investigators conducted an intention-to-treat analysis (in 38% of summaries [p < 0.001]), and it was more likely than the other 2 journals to report the precision of the treatment effect (as a p value or 95% confidence interval) (ACP 100%, JW 0% and IMA 55% [p < 0.001]). INTERPRETATION: Although ACP provided more information on study design, methodology and results, all 3 secondary journals often omitted important information. More complete reporting is necessary for secondary journals to fulfill their potential to help clinicians evaluate the medical literature. [Abstract] [Full text] [PDF]

Four newspaper stories. Brad Efron, Susan Holmes. Accessed on 2003-06-20. "Figures 2-5 reproduce four newspaper stories, each of which uses statistical comparisons to address an important medical question:1. Walking Women. Observational study of 72,488 nurses. Claims a 30 to 40% re-duction in heart attacks for women who walk 3 miles per hour at least 3 hours perweek.2. Balding Men. Observational study of 22,000 male doctors. Claims a 36% increasein heart attacks among men balding at the crown of the head.3. Magnesium Injections. Interventional study of 2316 emergency room patients.Claims a 25% reduction in near-term deaths among heart attack patients whoreceived a magnesium injection.4. Secretin and Autism. Randomized clinical trial of 56 autistic children. Claims nobenefit from administration of the drug secretin." www.stanford.edu/class/stat30/rr2.pdf

Exposing Flawed Science. Rick Groleau, Nova. Accessed on 2003-04-30. "Science is a human endeavor, subject to human imperfections. At one end of a spectrum covering what would generally be considered poor science we have those who intentionally deceive. At the other end are those who have the best of intentions but, for some reason, produce flawed results. Somewhere in the middle are those who have some knowledge of the topic they are investigating, but not enough to produce results that will stand up to scrutiny." www.pbs.org/wgbh/nova/holocaust/pseudoscience.html

A quality assessment of randomized control trials of primary treatment of breast cancer. A. Liberati, H. N. Himel, T. C. Chalmers. J Clin Oncol 1986: 4(6); p942-51. The methodology of randomized control trials (RCTs) of the primary treatment of early breast cancer has been reviewed using a quantitative method. Sixty-three RCTs comparing various treatment modalities tested on over 34,000 patients and reported in 119 papers were evaluated according to a standardized scoring system. A percentage score was developed to assess the internal validity of a study (referring to the quality of its design and execution) and its external validity (referring to presentation of information required to determine its generalizability). An overall score was also calculated as the combination of the two. The mean overall score for the 63 RCTs was 50% (95% confidence interval [CI] = 46% to 54%) with small and nonstatistically significant differences between types of trial. The most common methodologic deficiencies encountered in these studies were related to the randomization process (only 27 of the 63 RCTs adopted a truly blinded procedure), the handling of withdrawals (only 26 RCTs included all patients in the analyses), the description of the follow-up schedule (only 12 RCTs reported adequately), the report of side effects (adequate information given in 33 RCTs), and the description of the patient population (satisfactory in 29 RCTs). Telephone calls to the principal investigators improved the quality scores by seven points on a scale of 100, indicating that some of the deficiencies lay in reporting rather than performance. There was evidence that quality has improved over time and that the increasing tendency of involving a biostatistician in the research team was positively associated with the improvement of the internal validity but not with the external.

The methodological quality of randomized controlled trials of homeopathy, herbal medicines and acupuncture. K. Linde, W. B. Jonas, D. Melchart, S. Willich. Int J Epidemiol 2001: 30(3); p526-31. BACKGROUND: To investigate the methodological quality of randomized controlled trials in three areas of complementary medicine. METHODS: The methodological quality of 207 randomized trials collected for five previously published systematic reviews on homeopathy, herbal medicine (Hypericum for depression, Echinacea for common cold), and acupuncture (for asthma and chronic headache) was assessed using a validated scale (the Jadad scale) and single quality items. RESULTS: While the methodological quality of the trials was highly variable, the majority had important shortcomings in reporting and/or methodology. Major problems in most trials were the description of allocation concealment and the reporting of drop-outs and withdrawals. There were relevant differences in single quality components between the different complementary therapies: For example, acupuncture trials reported adequate allocation concealment less often (6% versus 32% of homeopathy and 26% of herb trials), and trials on herbal extracts had better summary scores (mean score 3.12 versus 2.33 for homeopathy and 2.19 for acupuncture trials). Larger trials published more recently in journals listed in Medline and in English language scored significantly higher than trials not meeting these criteria. CONCLUSION: Trials of complementary therapies often have relevant methodological weaknesses. The type of weaknesses varies considerably across interventions.

The Assert Statement. H. Mann. Accessed on 2003-06-30. "The ASSERT statement is the articulation of A Standard for the Scientific and Ethical Review of Trials. It proposes a structured approach whereby research ethics committees review proposals for, and monitor the conduct of, randomized controlled clinical trials." www.assert-statement.org/

Assessing the quality of randomized controlled trials. Current issues and future directions. D. Moher, A. R. Jadad, P. Tugwell. Int J Technol Assess Health Care 1996: 12(2); 195-208. Assessing the quality of randomized controlled trials is a relatively new and important development. Three approaches have been developed: component, checklist, and scale assessment. Component approaches evaluate selected aspects of trials, such as masking. Checklists and scales involve lists of items thought to be integral to study quality. Scales, unlike the other methods, provide a summary numeric score of quality, which can be formally incorporated into a systematic review. Most scales to date have not been developed with sufficient rigor, however. Empirical evidence indicates that differences in scale development can lead to important differences in quality assessment. Several methods for including quality scores in systematic reviews have been proposed, but since little empirical evidence supports any given method, results must be interpreted cautiously. Future efforts may be best focused on gathering more empirical evidence to identify trial characteristics directly related to bias in the estimates of intervention effects and on improving the way in which trials are reported.

Assessing the quality of randomized controlled trials: an annotated bibliography of scales and checklists. D. Moher, A. R. Jadad, G. Nichol, M. Penman, P. Tugwell, S. Walsh. Control Clin Trials 1995: 16(1); p62-73. Assessing the quality of randomized controlled trials (RCTs) is important and relatively new. Quality gives us an estimate of the likelihood that the results are a valid estimate of the truth. We present an annotated bibliography of scales and checklists developed to assess quality. Twenty-five scales and nine checklists have been developed to assess quality. The checklists are most useful in providing investigators with guidelines as to what information should be included in reporting RCTs. The scales give readers a quantitative index of the likelihood that the reported methodology and results are free of bias. There are several shortcomings with these scales. Future scale development is likely to be most beneficial if questions common to all trials are assessed, if the scale is easy to use, and if it is developed with sufficient rigor.

Research into complementary and alternative medicine: problems and potential. R. L. Nahin, S. E. Straus. British Medical Journal 2001: 322(7279); 161-4. [Full text]

Clinical trials in general surgical journals: are methods better reported? L. P. Schumm, J. S. Fisher, R. A. Thisted, J. Olak. Surgery 1999: 125(1); 41-5. BACKGROUND: Reports of clinical trials often lack adequate descriptions of their design and analysis. Thus readers cannot properly assess the strength of the findings and are limited in their ability to draw their own conclusions. A review of 6 surgical journals in 1984 revealed that the frequency of reporting 11 basic elements of design and analysis in clinical trials was only 59%. This study attempted to identify areas that still need improvement. METHODS: Eligible studies published from July 1995 through June 1996 included all reports of comparative clinical trials on human subjects that were prospective and had at least 2 treatment arms. A total of 68 articles published in 6 general surgery journals were reviewed. The frequency that the previously identified 11 basic elements of design and analysis were reported was determined. RESULTS: Seventy-four percent of all items were reported accurately (a 15% increase from the previous study), 4% were reported ambiguously, and 23% were not reported; improvement was seen in every journal. The reporting of eligibility criteria and statistical power improved the most. For 3 items, reporting was still not adequate; 32% of reports provided information about statistical power, 40% about the method of randomization, and 49% about whether the person assessing outcomes was blind to the treatment assignment. CONCLUSIONS: Improvements have been made in reporting surgical clinical trials, but in general methodologic questions poorly answered in the 1980s continue to be answered poorly in the 1990s. Editors of surgical journals are urged to provide authors with guidelines on how to report clinical trial design and analysis.

Many reports of RCTs give insufficient data for Cochrane reviewers. EH Walters, JA Walters. BMJ 1999: 319(7204); 257. ("Editors of journals should insist that, rather than giving the general statement that the design was randomised and double blind, reports should give a short description of the randomisation method used." and "In our series we have been able to extract fully all the data on reported outcomes in only six of the 30 papers; 15 yielded none, because what was presented was derivative (such as the change from baseline) or merely the P value for some statistical comparison.") Abstract not available. [Full text]

Review of randomised controlled trials of traditional chinese medicine. Jin-Ling Tang, SY Zhan, E Ernst. BMJ 1999: 319(7203); 160-61. [Medline] [Full text]

evidence >> mountain >> absolute (2)

evidence-based purchasing: understanding results of clinical trials and systematic reviews. T. Fahey, S. Griffiths, T. J. Peters. British Medical Journal 1995: 311(7012); 1056-9; discussion 1059-60. (Relative changes are interpreted differently than absolute changes.) OBJECTIVE--To assess whether the way in which the results of a randomised controlled trial and a systematic review are presented influences health policy decisions. DESIGN--A postal questionnaire to all members of a health authority within one regional health authority. SETTING--Anglia and Oxford regional health authorities. SUBJECTS--182 executive and non-executive members of 13 health authorities, family health services authorities, or health commissions. MAIN OUTCOME MEASURES--The average score from all health authority members in terms of their willingness to fund a mammography programme or cardiac rehabilitation programme according to four different ways of presenting the same results of research evidence--namely, as a relative risk reduction, absolute risk reduction, proportion of event free patients, or as the number of patients needed to be treated to prevent an adverse event. RESULTS--The willingness to fund either programme was significantly influenced by the way in which data were presented. Results of both programmes when expressed as relative risk reductions produced significantly higher scores when compared with other methods (P < 0.05). The difference was more extreme for mammography, for which the outcome condition is rarer. CONCLUSIONS--The method of reporting trial results has a considerable influence on the health policy decisions made by health authority members. [Medline] [Abstract] [Full text]

Who benefits from medical interventions? G. D. Smith, M. Egger. Bmj 1994: 308(6921); p72-4. Abstract not available. [Medline] [Full text]

evidence >> mountain >> blinding (16)

A comparison of active and simulated chiropractic manipulation as adjunctive treatment for childhood asthma. J. Balon, P. D. Aker, E. R. Crowther, C. Danielson, P. G. Cox, D. O'Shaughnessy, C. Walker, C. H. Goldsmith, E. Duku, M. R. Sears. New England Journal of Medicine 1998: 339(15); 1013-20. BACKGROUND: Chiropractic spinal manipulation has been reported to be of benefit in nonmusculoskeletal conditions, including asthma. METHODS: We conducted a randomized, controlled trial of chiropractic spinal manipulation for children with mild or moderate asthma. After a three-week base-line evaluation period, 91 children who had continuing symptoms of asthma despite usual medical therapy were randomly assigned to receive either active or simulated chiropractic manipulation for four months. None had previously received chiropractic care. Each subject was treated by 1 of 11 participating chiropractors, selected by the family according to location. The primary outcome measure was the change from base line in the peak expiratory flow, measured in the morning, before the use of a bronchodilator, at two and four months. Except for the treating chiropractor and one investigator (who was not involved in assessing outcomes), all participants remained fully blinded to treatment assignment throughout the study. RESULTS: Eighty children (38 in the active-treatment group and 42 in the simulated-treatment group) had outcome data that could be evaluated. There were small increases (7 to 12 liters per minute) in peak expiratory flow in the morning and the evening in both treatment groups, with no significant differences between the groups in the degree of change from base line (morning peak expiratory flow, P=0.49 at two months and P=0.82 at four months). Symptoms of asthma and use of 3-agonists decreased and the quality of life increased in both groups, with no significant differences between the groups. There were no significant changes in spirometric measurements or airway responsiveness. CONCLUSIONS: In children with mild or moderate asthma, the addition of chiropractic spinal manipulation to usual medical care provided no benefit. [Medline] [Abstract] [Full text] [PDF]

Why Bogus Therapies Seem to Work. Barry L. Beyerstein. Skeptical Inquirer 1997: 21(5); At least ten kinds of errors and biases can convince intelligent, honest people that cures have been achieved when they have not. [Full text]

Controlled trial of acupuncture for severe recidivist alcoholism. M. L. Bullock, P. D. Culliton, R. T. Olander. Lancet 1989: 1(8652); 1435-9. In a placebo-controlled study, 80 severe recidivist alcoholics received acupuncture either at points specific for the treatment of substance abuse (treatment group) or at nonspecific points (control group). 21 of 40 patients in the treatment group completed the programme compared with 1 of 40 controls. Significant treatment effects persisted at the end of the six-month follow-up: by comparison with treatment patients more control patients expressed a moderate to strong need for alcohol, and had more than twice the number of both drinking episodes and admissions to a detoxification centre.

How study design affects outcomes in comparisons of therapy. I: Medical. GA Colditz, JN Miller, F. Mosteller. Stat Med 1989: 8(4); 441-454. ABSTRACT: We analysed 113 reports published in 1980 in a sample of medical journals to relate features of study design to the magnitude of gains attributed to new therapies over old. Overall we rated 87 per cent of new therapies as improvements over standard therapies. The mean gain (measured by the Mann-Whitney statistic) was relatively constant across study designs, except for non-randomized controlled trials with sequential assignment to therapy, which showed a significantly higher likelihood that a patient would do better on the innovation than on standard therapy (p = 0.004). Randomized controlled trials that did not use a double-blind design had a higher likelihood of showing a gain for the innovation than did double-blind trials (p = 0.02). Any evaluation of an innovation may include both bias and the true efficacy of the new therapy, therefore we may consider making adjustments for the average bias associated with a study design. When interpreting an evaluation of a new therapy, readers should consider the impact of the following average adjustments to the Mann-Whitney statistic: for trials with non-random sequential assignment a decrease of 0.15, for non-double-blind randomized controlled trials a decrease of 0.11.

Physician interpretations and textbook definitions of blinding terminology in randomized controlled trials. P. J. Devereaux, B. J. Manns, W. A. Ghali, H. Quan, C. Lacchetti, V. M. Montori, M. Bhandari, G. H. Guyatt. Jama 2001: 285(15); 2000-3. CONTEXT: When clinicians assess the validity of randomized controlled trials (RCTs), they commonly evaluate the blinding status of individuals in the RCT. The terminology authors often use to convey blinding status (single, double, and triple blinding) may be open to various interpretations. OBJECTIVE: To determine physician interpretations and textbook definitions of RCT blinding terms. DESIGN AND SETTING: Observational study undertaken at 3 Canadian university tertiary care centers between February and May 1999. PARTICIPANTS: Ninety-one internal medicine physicians who responded to a survey. MAIN OUTCOME MEASURES: Respondents identified which of the following groups they thought were blinded in single-, double-, and triple-blinded RCTs: participants, health care providers, data collectors, judicial assessors of outcomes, data analysts, and personnel who write the article. Definitions from 25 systematically identified textbooks published since 1990 providing definitions for single, double, or triple blinding. RESULTS: Physician respondents identified 10, 17, and 15 unique interpretations of single, double, and triple blinding, respectively, and textbooks provided 5, 9, and 7 different definitions of each. The frequencies of the most common physician interpretation and textbook definition were 75% (95% confidence interval [CI], 65%-83%) and 74% (95% CI, 52%-90%) for single blinding, 38% (95% CI, 28%-49%) and 43% (95% CI, 24%-63%) for double blinding, and 18% (95% CI, 10%-28%) and 14% (95% CI, 0%-58%) for triple blinding, respectively. CONCLUSIONS: Our study suggests that both physicians and textbooks vary greatly in their interpretations and definitions of single, double, and triple blinding. Explicit statements about the blinding status of specific groups involved in RCTs should replace the current ambiguous terminology. [Medline] [Abstract] [Full text] [PDF]

"Double blind, you are the weakest link- good-bye!" P.J. Devereaux, M. Bhandari, V. M. Montori, B.J. Manns, W.A. Ghali, G. H. Guyatt. ACP Journal Club 2002: 136A11-A12. Abstract not available.

Removing bias in surgical trials. A. G. Johnson, J. M. Dixon. British Medical Journal 1997: 314(7085); 916-7. Abstract not available. [Full text]

Empirical evidence of design-related bias in studies of diagnostic tests. JG Lijmer, BW Mol, S Heisterkamp, GJ Bonsel, MH Prins, JH van der Meulen, PM Bossuyt. JAMA 1999: 282(11); 1061-1066. ABSTRACT: CONTEXT: The literature contains a large number of potential biases in the evaluation of diagnostic tests. Strict application of appropriate methodological criteria would invalidate the clinical application of most study results. OBJECTIVE: To empirically determine the quantitative effect of study design shortcomings on estimates of diagnostic accuracy. DESIGN AND SETTING: Observational study of the methodological features of 184 original studies evaluating 218 diagnostic tests. Meta-analyses on diagnostic tests were identified through a systematic search of the literature using MEDLINE, EMBASE, and DARE databases and the Cochrane Library (1996-1997). Associations between study characteristics and estimates of diagnostic accuracy were evaluated with a regression model. MAIN OUTCOME MEASURES: Relative diagnostic odds ratio (RDOR), which compared the diagnostic odds ratios of studies of a given test that lacked a particular methodological feature with those without the corresponding shortcomings in design. RESULTS: Fifteen (6.8%) of 218 evaluations met all 8 criteria; 64 (30%) met 6 or more. Studies evaluating tests in a diseased population and a separate control group overestimated the diagnostic performance compared with studies that used a clinical population (RDOR, 3.0; 95% confidence interval [CI], 2.0-4.5). Studies in which different reference tests were used for positive and negative results of the test under study overestimated the diagnostic performance compared with studies using a single reference test for all patients (RDOR, 2.2; 95% CI, 1.5-3.3). Diagnostic performance was also overestimated when the reference test was interpreted with knowledge of the test result (RDOR, 1.3; 95% CI, 1.0-1.9), when no criteria for the test were described (RDOR, 1.7; 95% CI, 1.1-2.5), and when no description of the population under study was provided (RDOR, 1.4; 95% CI, 1.1-1.7). CONCLUSION: These data provide empirical evidence that diagnostic studies with methodological shortcomings may overestimate the accuracy of a diagnostic test, particularly those including nonrepresentative patients or applying different reference standards.

An addition to the controversy on sunlight exposure and melanoma risk: a meta-analytical approach. P. J. Nelemans, F. H. Rampen, D. J. Ruiter, A. L. Verbeek. J Clin Epidemiol 1995: 48(11); 1331-42. Case control studies on the association between sunlight exposure and melanoma risk show considerable differences in design; this could be responsible for the variation in study results. In an attempt to resolve the controversy between study results, the results of 25 publications on case control studies were evaluated using meta-analytical techniques. Comparison of odds ratios between subgroups of studies revealed that the range of odds ratios was far greater for hospital-based studies than for population-based studies. For the latter type of studies, the odds ratios were homogeneous and the pooled odds ratios were 1.57 (95% confidence interval [CI], 1.29-1.91) for intermittent sunlight exposure and 0.73 (95% CI, 0.60-0.89) for chronic exposure. However, among other problems, the lack of standardized measures for sunlight exposure warrants cautious interpretation of these results. It is concluded that evidence to support the intermittent sunlight theory is still far from complete.

The impact of blinding on the results of a randomized, placebo-controlled multiple sclerosis clinical trial. J. H. Noseworthy, G. C. Ebers, M. K. Vandervoort, R. E. Farquhar, E. Yetisir, R. Roberts. Neurology 1994: 44(1); p16-20. In the randomized, placebo-controlled, physician-blinded Canadian cooperative trial of cyclophosphamide and plasma exchange, neither active treatment regimens (group I: i.v. cyclophosphamide and prednisone; group II: weekly plasma exchange, oral cyclophosphamide, and prednisone) were superior to placebo (group III: sham plasma exchange and placebo medications) using the blinded, evaluating neurologists' assessments of disease course (primary analysis). All patients were examined by both a blinded and an unblinded neurologist at each assessment in this trial. We compared the blinded and unblinded neurologists' judgment of treatment response and analyzed the clinical behavior of patients who correctly guessed their treatment. The unblinded (but not the blinded) neurologists' scores demonstrated an apparent treatment benefit at 6, 12, and 24 months for the group II patients (not group I or placebo; p < 0.05, two-tailed). There were no significant differences in the time to treatment failure or in the proportions of patients improved, stable, or worse between the group II and group III patients who correctly guessed their treatment assignments and those who did not. Physician blinding prevented an erroneous conclusion about treatment efficacy (false positive, type 1 error).

Inconsistencies and Errors in Alternative Medicine Research. W Sampson. Skeptical Inquirer 1997: 21(5); 35-38.

The Landscape and Lexicon of Blinding in Randomized Trials. K.F. Schulz, I. Chalmers, D.G. Altman. Annals of Internal Medicine 2002: 136(3); 254-259. Abstract not available.

Blinding in randomised trials: hiding who got what. K. F. Schulz, D.A. Grimes. Lancet 2002: 359696-700. Blinding embodies a rich history spanning over two centuries. Most researchers worldwide understand blinding terminology, but confusion lurks beyond a general comprehension. Terms such as single blind, double blind, and triple blind mean different things to different people. Moreover, many medical researchers confuse blinding with allocation concealment. Such confusion indicates misunderstandings of both. The term blinding refers to keeping trial participants, investigators (usually health-care providers), or assessors (those collecting outcome data) unaware of the assigned intervention, so that they will not be influenced by that knowledge. Blinding usually reduces differential assessment of outcomes (information bias), but can also improve compliance and retention of trial participants while reducing biased supplemental care or treatment (sometimes called co-intervention). Many investigators and readers naively consider a randomised trial as high quality simply because it is double blind, as if double-blinding is the sine qua non of a randomised controlled trial. Although double blinding (blinding investigators, participants, and outcome assessors) indicates a strong design, trials that are not double blinded should not automatically be deemed inferior. Rather than solely relying on terminology like double blinding, researchers should explicitly state who was blinded, and how. We recommend placing greater credence in results when investigators at least blind outcome assessments, except with objective outcomes, such as death, which leave little room for bias. If investigators properly report their blinding efforts, readers can judge them. Unfortunately, many articles do not contain proper reporting. If an article claims blinding without any accompanying clarification, readers should remain sceptical about its effect on bias reduction.

Assessing Allocation Concealment and Blinding in Randomised Controlled Trials: Why bother? KF Schulz. Evid Based Nurs 2001: 44 - 6. NA

Understanding controlled trials randomisation methods: concealment. DJ Torgerson, C Roberts. BMJ 1999: 319(7206); 375-76. Abstract not available. [Full text] [PDF]

Is the placebo powerless? An analysis of clinical trials comparing placebo with no treatment. A. Hrobjartsson, P. C. Gotzsche. N Engl J Med 2001: 344(21); 1594-602. BACKGROUND: Placebo treatments have been reported to help patients with many diseases, but the quality of the evidence supporting this finding has not been rigorously evaluated. METHODS: We conducted a systematic review of clinical trials in which patients were randomly assigned to either placebo or no treatment. A placebo could be pharmacologic (e.g., a tablet), physical (e.g., a manipulation), or psychological (e.g., a conversation). RESULTS: We identified 130 trials that met our inclusion criteria. After the exclusion of 16 trials without relevant data on outcomes, there were 32 with binary outcomes (involving 3795 patients, with a median of 51 patients per trial) and 82 with continuous outcomes (involving 4730 patients, with a median of 27 patients per trial). As compared with no treatment, placebo had no significant effect on binary outcomes (pooled relative risk of an unwanted outcome with placebo, 0.95; 95 percent confidence interval, 0.88 to 1.02), regardless of whether these outcomes were subjective or objective. For the trials with continuous outcomes, placebo had a beneficial effect (pooled standardized mean difference in the value for an unwanted outcome between the placebo and untreated groups, -0.28; 95 percent confidence interval, -0.38 to -0.19), but the effect decreased with increasing sample size, indicating a possible bias related to the effects of small trials. The pooled standardized mean difference was significant for the trials with subjective outcomes (-0.36; 95 percent confidence interval, -0.47 to -0.25) but not for those with objective outcomes. In 27 trials involving the treatment of pain, placebo had a beneficial effect (-0.27; 95 percent confidence interval, -0.40 to -0.15). This corresponded to a reduction in the intensity of pain of 6.5 mm on a 100-mm visual-analogue scale. CONCLUSIONS: We found little evidence in general that placebos had powerful clinical effects. Although placebos had no significant effects on objective or binary outcomes, they had possible small benefits in studies with continuous subjective outcomes and for the treatment of pain. Outside the setting of clinical trials, there is no justification for the use of placebos. [Abstract] [Full text] [PDF]

evidence >> mountain >> causation (1)

From Association to Causation: Some Remarks on the History of Statistics. D Freedman. Statistical Science 1999: 14(3); 243-258.

evidence >> mountain >> clinical (5)

The association of nonsteroidal anti-inflammatory drugs with upper gastrointestinal tract bleeding. J. L. Carson, B. L. Strom, K. A. Soper, S. L. West, M. L. Morse. Arch Intern Med 1987: 147(1); 85-8. To evaluate the risk of developing upper gastrointestinal (UGI) bleeding from nonsteroidal anti-inflammatory drugs (NSAIDs), a retrospective (historical) cohort study was performed, using a computerized data base including 1980 billing data from all Medicaid patients in the states of Michigan and Minnesota. Comparing 47,136 exposed patients to 44,634 unexposed patients, the unadjusted relative risk for developing UGI bleeding 30 days after exposure to a NSAID was 1.5 (95% confidence interval 1.2 to 2.0). Univariate analyses demonstrated associations between UGI bleeding and age, sex, state, alcohol-related diagnoses, preexisting abdominal conditions, and use of anticoagulants. This association between NSAIDs and UGI bleeding was unchanged after adjusting for these potential confounding variables using logistic regression. A linear dose-response relationship and a quadratic duration-response relationship were demonstrated. Non-steroidal anti-inflammatory drugs are associated with UGI bleeding, although the magnitude of the increased risk is reassuringly small.

An assessment of clinically useful measures of the consequences of treatment. A Laupacis, DL Sackett, RS Roberts. New England Journal of Med 1988: 318(26); 1728-1733. Abstract not available.

Effect of homoeopathy on pain and other events after acute trauma: placebo controlled trial with bilateral oral surgery. P. Lokken, P. A. Straumsheim, D. Tveiten, P. Skjelbred, C. F. Borchgrevink. British Medical Journal 1995: 310(6992); 1439-42. OBJECTIVE--To examine whether homoeopathy has any effect on pain and other inflammatory events after surgery. DESIGN--Randomised double blind, placebo controlled crossover trial with "identical" oral surgical procedures performed on two separate occasions in 24 patients. INTERVENTIONS--Treatment started 3 hours after surgery with either homoeopathy or placebo. MAIN OUTCOME MEASURES--Postoperative pain and preference for postoperative course assessed by patients on visual analogue scales. Measurements of postoperative swelling and reduction in ability to open mouth. Assessment of bleeding after surgery. RESULTS--Pain after surgery was essentially the same whether treated with homoeopathy or placebo. Postoperative swelling was not significantly affected by homoeopathy, but treatment tended to give less reduction in ability to open mouth. No noticeable difference was seen in postoperative bleeding, side effects, or complaints. Thirteen of the 24 patients preferred the postoperative course with placebo. CONCLUSIONS--No positive evidence was found for efficacy of homoeopathic treatment on pain and other inflammatory events after an acute soft tissue and bone injury inflicted by a surgical intervention. Differences in the order of 30% to 40% would have been needed to show significant effects. [Medline] [Abstract] [Full text]

Interventions for promoting smoking cessation during pregnancy. J. Lumley, S. Oliver, E. Waters. Cochrane Database Syst Rev 2000: (2); pCD001055. BACKGROUND: Smoking remains one of the few potentially preventable factors associated with low birthweight, very preterm birth and perinatal death. OBJECTIVES: The objective of this review was to assess the effects of smoking cessation programs implemented during pregnancy on the health of the fetus and infant, on the mother and on the family. SEARCH STRATEGY: We searched the Cochrane Pregnancy and Childbirth Group trials register and the Cochrane Tobacco Addiction Group trials register. SELECTION CRITERIA: Randomised and quasi-randomised trials of smoking cessation programs implemented during pregnancy. DATA COLLECTION AND ANALYSIS: Trial quality was assessed and data were extracted independently by two reviewers. MAIN RESULTS: Forty-four trials were identified: 37 trials including 16,916 women provided data on smoking cessation and/or perinatal outcomes, as did one cluster-randomised trial including 3000 women. Over 800 women were included in trials of smoking relapse prevention. There was substantial variation in the intensity of the intervention and the extent of reminders and reinforcement through pregnancy. Based on 34 trials there was a significant reduction in smoking in the intervention groups (odds ratio 0.53, 95% confidence interval 0. 47 to 0.60), an absolute difference of 6.4% women continuing to smoke. The eight trials with validated smoking cessation, a high intensity intervention and a high quality score had an odds ratio of 0.53, 95% confidence interval 0.44 to 0.63 and an absolute difference in continued smoking of 8.1%. The subset of trials with information on fetal outcome revealed a reduction in low birthweight (odds ratio 0.80, 95% confidence interval 0.67 to 0.95), a reduction in preterm birth (odds ratio 0.83, 95% confidence interval 0.69 to 0. 99) and an increase in mean birthweight of 28g (95% confidence interval 9 to 49). There were no differences in very low birthweight or perinatal mortality. Five trials of smoking relapse prevention showed no significant difference. The single large cluster-randomised trial showed no evidence of a decrease in continued smoking or adjusted mean birthweight. REVIEWER'S CONCLUSIONS: Smoking cessation programs in pregnancy appear to reduce smoking, low birthweight and preterm birth, but no effect was detected for very low birthweight or perinatal mortality.

How well is the clinical importance of study results reported? An assessment of randomized controlled trials. K. B. Chan, M. Man-Son-Hing, F. J. Molnar, A. Laupacis. Cmaj 2001: 165(9); 1197-202. BACKGROUND: The interpretation of the results of randomized controlled trials (RCTs) has traditionally emphasized statistical significance rather than clinical importance. Our aim was to assess the quality of reporting of factors related to clinical importance in a sample of published RCTs. METHODS: A random sample of 27 (of a total of 266) RCTs published in 5 major medical journals over a 1-year period were reviewed by 4 independent reviewers for factors considered important in the interpretation of the clinical importance of study results: identification of a clearly defined primary outcome, reporting of the expected difference between groups used in the calculation of sample size (the delta value) and whether it was based on the minimal clinically important difference of the intervention, the statistical significance of the results, presentation of pertinent confidence intervals, and the authors' interpretation of the clinical importance of the results. RESULTS: Twenty-two of 27 (81%) articles explicitly reported a single primary outcome. Of the 20 articles that included a sample size calculation, 18 (90%) reported a delta value. Two of the 18 (11%) articles explicitly stated that the delta value was chosen to reflect the minimal clinically important difference of the intervention. For the primary outcomes, confidence intervals surrounding the point estimates of the efficacy of the interventions were reported in 11 of 27 (41%) studies. The study results were interpreted from the perspective of clinical importance in 20 of 27 (74%) of the articles. Of these 20 reports, 5 (25%) provided justification for their clinical interpretation of the results. INTERPRETATION: Authors of RCTs published in major general medical and internal medicine journals do not consistently provide their own interpretation of the clinical importance of their results, and they often do not provide sufficient information to allow readers to make their own interpretation. [Abstract] [Full text] [PDF]

evidence >> mountain >> fishing (6)

The Method of Multiple Working Hypotheses. TC Chamberlin. The Scientific Monthly 1944 reprint (1931 reprint - Journal of Geology) (original 1897): 59357 - 62. N/A

Do multiple outcome measures require p-value adjustment? R. J. Feise. BMC Med Res Methodol 2002: 2(1); 8. BACKGROUND: Readers may question the interpretation of findings in clinical trials when multiple outcome measures are used without adjustment of the p-value. This question arises because of the increased risk of Type I errors (findings of false "significance") when multiple simultaneous hypotheses are tested at set p-values. The primary aim of this study was to estimate the need to make appropriate p-value adjustments in clinical trials to compensate for a possible increased risk in committing Type I errors when multiple outcome measures are used. DISCUSSION: The classicists believe that the chance of finding at least one test statistically significant due to chance and incorrectly declaring a difference increases as the number of comparisons increases. The rationalists have the following objections to that theory: 1) P-value adjustments are calculated based on how many tests are to be considered, and that number has been defined arbitrarily and variably; 2) P-value adjustments reduce the chance of making type I errors, but they increase the chance of making type II errors or needing to increase the sample size. SUMMARY: Readers should balance a study's statistical significance with the magnitude of effect, the quality of the study and with findings from other studies. Researchers facing multiple outcome measures might want to either select a primary outcome measure or use a global assessment measure, rather than adjusting the p-value.

Assessing cause and effect from trials: a cautionary note. D. Howel, R. Bhopal. Control Clin Trials 1994: 15(5); 331-4.

Quantitative Evaluation of Multiplicity in Epidemiology and Public Health Research. Kenneth J. Ottenbacher. American Journal of Epidemiology 1998: 147(7); 615-619. ABSTRACT: Epidemiologic and public health researchers frequently include several dependent variables, repeated assessments, or subgroup analyses in their investigations. These factors result in multiple tests of statistical significance and may produce type 1 experimental errors. This study examined the type 1 error rate in a sample of public health and epidemiologic research. A total of 173 articles chosen at random from 1996 issues of the American Journal of Public Health and the American Journal of Epidemiology were examined to determine the incidence of type 1 errors. Three different methods of computing type 1 error rates were used: experiment-wise error rate, error rate per experiment, and percent error rate. The results indicate a type 1 error rate substantially higher than the traditionally assumed level of 5% (p < 0.05). No practical or statistically significant difference was found between type 1 error rates across the two journals. Methods to determine and correct type 1 errors should be reported in epidemiologic and public health research investigations that include multiple statistical tests.

Cured and broiled meat consumption in relation to childhood cancer: Denver, Colorado (United States). S. Sarasua, D. A. Savitz. Cancer Causes Control 1994: 5(2); 141-8. The association between cured and broiled meat consumption by the mother during pregnancy and by the child was examined in relation to childhood cancer. Five meat groups (ham, bacon, or sausage; hot dogs; hamburgers; bologna, pastrami, corned beef, salami, or lunch meat; charcoal broiled foods) were assessed. Exposures among 234 cancer cases (including 56 acute lymphocytic leukemia [ALL], 45 brain tumor) and 206 controls selected by random-digit dialing in the Denver, Colorado (United States) standard metropolitan statistical area were compared, with adjustment for confounders. Maternal hot-dog consumption of one or more times per week was associated with childhood brain tumors (odds ratio [OR] = 2.3, 95 percent confidence interval [CI] = 1.0-5.4). Among children, eating hamburgers one or more times per week was associated with risk of ALL (OR = 2.0, CI = 0.9-4.6) and eating hot dogs one or more times per week was associated with brain tumors (OR = 2.1, CI = 0.7-6.1). Among children, the combination of no vitamins and eating meats was associated more strongly with both ALL and brain cancer than either no vitamins or meat consumption alone, producing ORs of two to seven. The results linking hot dogs and brain tumors (replicating an earlier study) and the apparent synergism between no vitamins and meat consumption suggest a possible adverse effect of dietary nitrites and nitrosamines.

Invited Commentary: Re: "Multiple Comparisons and Related Issues in the Interpretation of Epidemiologic Data". John R. Thompson. American Journal of Epidemiology 1998: 147(9); 801-811. Abstract not available.

evidence >> mountain >> particularizing (3)

Applying the results of trials and systematic reviews to individual patients. P. Glasziou, G. H. Guyatt, A. L. Dans, L. F. Dans, S. Straus, D. L. Sackett. ACP Journal Club 1998: 129(3); A15-6. Your patient is a 60-year-old hypertensive, alcoholic woman whose symptomless atrial fibrillation was first documented 3 months ago. An echocardiogram shows an enlarged left atrium, rendering successful cardioversion unlikely. She tells you that both of her parents had severe strokes that made the last years of their lives horrible, and she is terrified of having a stroke. You know that a meta-analysis of 5 randomized trials of warfarin in nonvalvular atrial fibrillation demonstrated a 68% relative risk reduction (RRR) in stroke (1). You consider prescribing warfarin for this patient but know that she would not have qualified for the study because alcoholism increases her risk for major hemorrhage (2).

Decision analysis and the implementation of research findings. R. J. Lilford, S. G. Pauker, D. A. Braunholtz, J. Chard. British Medical Journal 1998: 317(7155); 405-9. [Full text]

Pronouncements about the need for "generalizability" of randomized controlled trial results are humbug. D.L. Sackett. Control. Clinical Trials 2000: 2182S. Abstract not available.

evidence >> mountain >> plausibility (5)

Association and Cause. Raymond Agius. Accessed on 2003-05-15. "Aims of this resource: To enable an understanding of the important concepts in determining causes of ill-health with emphasis on epidemiology and the environmental and occupational aspects of public health. To enable a distinction to be made between associations that are likely to be causal and those which probably have other explanations." www.agius.com/hew/resource/assoc.htm

Dulcet tones of a surgeon's voice may have a hidden meaning. R. Dobson. Bmj 2002: 325(7359); 297. [Medline] [Full text] [PDF]

Unconventional therapies for cancer: a refuge from the rules of evidence? I. F. Tannock, D. G. Warr. Cmaj 1998: 159(7); 801-2. Abstract not available yet. [Full text] [PDF]

Minerva Review. Author Unknown. British Medical Journal 2000: 320(7243); 1218-1236. About a fifth of hip fractures in both men and women are caused by smoking (International Journal of Epidemiology 2000;29:253-9). An analysis of longitudinal data from over 30 000 Danish people shows that for men, the risk falls if they stop smoking, whereas women remain vulnerable to hip fracture for much longer after quitting. Fortunately, exercise reduces the risk of hip fracture in middle aged and older women (308-14), so stop smoking and start cycling, jogging, or (Minerva's favourite) bouncing up and down on a small trampoline in front of the telly. [Full text] [PDF]

Biologic plausibility in causal inference: current method and practice. D. L. Weed, S. D. Hursting. Am J Epidemiol 1998: 147(5); 415-25. Abstract not available.

evidence >> mountain >> posthoc (8)

Things to know and do about cancer clusters. T. Aldrich, T. Sinks. Cancer Invest 2002: 20(5-6); 810-6. Perceived cancer clusters present difficulties and opportunities for clinicians and public health officials alike. Public health officials receive reports of perceived cancer clusters, evaluate the validity of these reports, and/or launch investigations to identify potential causes. Clinicians interact directly with the affected patients, families, or community representatives who question the occurrence of cancer and the underlying causes. Clinicians may identify cancer clusters when they question the unusual occurrence of a rare form of cancer within their practice or community. In addition, clinicians may be asked to discuss cancer clusters and inform local debates. In this paper, we describe the public health practice experience with cancer clusters and identify cancer prevention and control opportunities for clinicians and public health officials. Scientific investigations of cancer clusters rarely uncover new knowledge about the causes of cancer. However, a set of common characteristics, unique to etiologic cluster investigations have uncovered new information about the causes of cancer or demonstrated a preventable link to a known carcinogen. These characteristics may provide useful clues for sorting out the small number of clusters worthy of further scientific investigation. Public awareness of cancer clusters may promote an opportunity to inform and motivate people about the preventable causes of cancer and effective cancer screening methods.

Dangers of using "optimal" cutpoints in the evaluation of prognostic factors. D. G. Altman, B. Lausen, W. Sauerbrei, M. Schumacher. Journal of the National Cancer Institute 1994: 86(11); 829-35. Abstract not available yet.

Effects of selenium supplementation for cancer prevention in patients with carcinoma of the skin. A randomized controlled trial. Nutritional Prevention of Cancer Study Group. L. C. Clark, G. F. Combs, Jr., B. W. Turnbull, E. H. Slate, D. K. Chalker, J. Chow, L. S. Davis, R. A. Glover, G. F. Graham, E. G. Gross, A. Krongrad, J. L. Lesher, Jr., H. K. Park, B. B. Sanders, Jr., C. L. Smith, J. R. Taylor. Jama 1996: 276(24); 1957-63. OBJECTIVE: To determine whether a nutritional supplement of selenium will decrease the incidence of cancer. DESIGN: A multicenter, double-blind, randomized, placebo-controlled cancer prevention trial. SETTING: Seven dermatology clinics in the eastern United States. PATIENTS: A total of 1312 patients (mean age, 63 years; range, 18-80 years) with a history of basal cell or squamous cell carcinomas of the skin were randomized from 1983 through 1991. Patients were treated for a mean (SD) of 4.5 (2.8) years and had a total follow-up of 6.4 (2.0) years. INTERVENTIONS: Oral administration of 200 microg of selenium per day or placebo. MAIN OUTCOME MEASURES: The primary end points for the trial were the incidences of basal and squamous cell carcinomas of the skin. The secondary end points, established in 1990, were all-cause mortality and total cancer mortality, total cancer incidence, and the incidences of lung, prostate, and colorectal cancers. RESULTS: After a total follow-up of 8271 person-years, selenium treatment did not significantly affect the incidence of basal cell or squamous cell skin cancer. There were 377 new cases of basal cell skin cancer among patients in the selenium group and 350 cases among the control group (relative risk [RR], 1.10; 95% confidence interval [CI], 0.95-1.28), and 218 new squamous cell skin cancers in the selenium group and 190 cases among the controls (RR, 1.14; 95% CI, 0.93-1.39). Analysis of secondary end points revealed that, compared with controls, patients treated with selenium had a nonsignificant reduction in all-cause mortality (108 deaths in the selenium group and 129 deaths in the control group [RR; 0.83; 95% CI, 0.63-1.08]) and significant reductions in total cancer mortality (29 deaths in the selenium treatment group and 57 deaths in controls [RR, 0.50; 95% CI, 0.31-0.80]), total cancer incidence (77 cancers in the selenium group and 119 in controls [RR, 0.63; 95% CI, 0.47-0.85]), and incidences of lung, colorectal, and prostate cancers. Primarily because of the apparent reductions in total cancer mortality and total cancer incidence in the selenium group, the blinded phase of the trial was stopped early. No cases of selenium toxicity occurred. CONCLUSIONS: Selenium treatment did not protect against development of basal or squamous cell carcinomas of the skin. However, results from secondary end-point analyses support the hypothesis that supplemental selenium may reduce the incidence of, and mortality from, carcinomas of several sites. These effects of selenium require confirmation in an independent trial of appropriate design before new public health recommendations regarding selenium supplementation can be made.

Journals should see original protocols for clinical trials. C J Hawkey. BMJ 2001: 323(7324); 1309-. [Medline] [Full text]

Randomised controlled trial of cardiotocography versus Doppler auscultation of fetal heart at admission in labour in low risk obstetric population. G. Mires, F. Williams, P. Howie. British Medical Journal 2001: 322(7300); 1457-60; discussion 1460-2. (See "Commentary: changes between protocol and manuscript should be declared at submission" at the end of this article.) OBJECTIVE: To compare the effect of admission cardiotocography and Doppler auscultation of the fetal heart on neonatal outcome and levels of obstetric intervention in a low risk obstetric population. DESIGN: Randomised controlled trial. SETTING: Obstetric unit of teaching hospital PARTICIPANTS: Pregnant women who had no obstetric complications that warranted continuous monitoring of fetal heart rate in labour. INTERVENTION: Women were randomised to receive either cardiotocography or Doppler auscultation of the fetal heart when they were admitted in spontaneous uncomplicated labour. MAIN OUTCOME MEASURES: The primary outcome measure was umbilical arterial metabolic acidosis. Secondary outcome measures included other measures of condition at birth and obstetric intervention. RESULTS: There were no significant differences in the incidence of metabolic acidosis or any other measure of neonatal outcome among women who remained at low risk when they were admitted in labour. However, compared with women who received Doppler auscultation, women who had admission cardiotocography were significantly more likely to have continuous fetal heart rate monitoring in labour (odds ratio 1.49, 95% confidence interval 1.26 to 1.76), augmentation of labour (1.26, 1.02 to 1.56), epidural analgesia (1.33, 1.10 to 1.61), and operative delivery (1.36, 1.12 to 1.65). CONCLUSIONS: Compared with Doppler auscultation of the fetal heart, admission cardiotocography does not benefit neonatal outcome in low risk women. Its use results in increased obstetric intervention, including operative delivery. [Medline] [Abstract] [Full text] [PDF]

Celestial determinants of success in research. R. Pollex, B. Hegele, M.R. Ban. Cmaj 2001: 165(12); 1584. [Medline] [Full text] [PDF]

Cancer Clusters: Finding Vs. Feelings. David Robinson, Medscape. Accessed on 2003-05-09. "Several challenges bedevil any cancer cluster investigation and can result in ambiguous or misleading conclusions. This report discusses the potential cancer clusters in Toms River, New Jersey and Long Island, New York, because they contain many elements typical of cancer cluster investigations and have received considerable media attention." Posted 11/06/2002. www.medscape.com/viewarticle/442554_1

False positive outcomes and design characteristics in occupational cancer epidemiology studies. G. G. Swaen, O. Teggeler, L. G. van Amelsvoort. Int J Epidemiol 2001: 30(5); 948-54. BACKGROUND: Recently there has been considerable debate about possible false positive study outcomes. Several well-known epidemiologists have expressed their concern and the possibility that epidemiological research may loose credibility with policy makers as well as the general public. METHODS: We have identified 75 false positive studies and 150 true positive studies, all published reports and all epidemiological studies reporting results on substances or work processes generally recognized as being carcinogenic to humans. All studies were scored on a number of design characteristics and factors relating to the specificity of the research objective. These factors included type of study design, use of cancer registry data, adjustment for smoking and other factors, availability of exposure data, dose- and duration-effect relationship, magnitude of the reported relative risk, whether the study was considered a 'fishing expedition', affiliation and country of the first author. RESULTS: The strongest factor associated with the false positive or true positive study outcome was if the study had a specific a priori hypothesis. Fishing expeditions had an over threefold odds ratio of being false positive. Factors that decreased the odds ratio of a false positive outcome included observing a dose-effect relationship, adjusting for smoking and not using cancer registry data. CONCLUSION: The results of the analysis reported here clearly indicate that a study with a specific a priori study objective should be valued more highly in establishing a causal link between exposure and effect than a mere fishing expedition.

evidence >> mountain >> precision (1)

Why randomized controlled trials fail but needn't: 2. Failure to employ physiological statistics, or the only formula a clinician-trialist is ever likely to need (or understand!). D. L. Sackett. Cmaj 2001: 165(9); p1226-37. [Medline] [Full text] [PDF]

evidence >> mountain >> retrospective (1)

Recall bias in a case-control surveillance system on the use of medicine during pregnancy. M. Rockenbauer, J. Olsen, A. E. Czeizel, L. Pedersen, H. T. Sorensen. Epidemiology 2001: 12(4); p461-6. It is important to study possible teratogenic effects of drugs used during pregnancy. Many studies of this type rely upon case-control designs in which drug intake is recalled by the mothers after having given birth. Recall bias in this situation may lead to spurious associations. We looked for indicators of recall bias by comparing self-reported drug intake with medically notified intake for specific diseases in the Hungarian Case-Control Surveillance System of Congenital Abnormalities, which includes 22,865 cases with congenital abnormalities and 39,151 controls. Recall error was present, especially for drugs used for a short time period. Furthermore, the timing of drug intake was reported slightly closer to the time of interview for cases compared than for controls. Severe or visible congenital abnormalities did not appear to be more conducive to recall bias than other abnormalities under study. A case-control surveillance system of this type may frequently cause spurious associations, with biased odds ratios up to a factor of 1.9.

evidence >> mountain >> sample size (1)

The ethics of tiny trials. B. Phillips. Arch Dis Child 2002: 87(3); 258. Abstract not available yet.

evidence >> mountain >> samplesize (5)

Negative results of randomized clinical trials published in the surgical literature: equivalency or error? J. B. Dimick, M. Diener-West, P. A. Lipsett. Arch Surg 2001: 136(7); 796-800. HYPOTHESIS: We hypothesized that review of randomized controlled clinical trials (RCTs) with nonstatistically significant or "negative" results published in the surgical literature do not have appropriate statistical power to demonstrate equivalency between treatment arms. DATA SOURCES AND STUDY SELECTION: The MEDLINE database was searched to obtain reports of all RCTs with negative results published in 3 surgical journals from 1988 to 1998. Manual review of one year (1997) of publications for each journal was performed to validate our search strategy. Equivalency was evaluated using the Two One-Sided Tests Procedure and post hoc power calculations. DATA SYNTHESIS: Ninety reports of RCTs with negative results were identified in the surgical literature between 1988 and 1998. The manual review of 1997 showed a 100% retrieval rate for our search strategy. After applying the Two One-Sided Tests Procedure, 35 reports (39%) met the criteria for demonstrating equivalency. The other 55 reports (61%) contained at least a 10% absolute difference in the 90% confidence interval of Delta. Using the power calculation method, only 22 (24%) articles had a power greater than.80 to detect a 50% difference in therapeutic effect. Only 29% of the reports included a formal sample size calculation and these studies were more likely to demonstrate equivalency than those without a sample size estimate (P<.01). CONCLUSIONS: Many reports from negative RCTs published in the surgical literature lack sufficient statistical power to establish that clinically important differences are not present. Surgeons should perform appropriate sample size calculations when designing RCTs and recognize the utility of confidence intervals when reporting negative results.

Putting trials on trial--the costs and consequences of small trials in depression: a systematic review of methodology. M. Hotopf, G. Lewis, C. Normand. J Epidemiol Community Health 1997: 51(4); p354-8. STUDY OBJECTIVE: To determine why, despite 122 randomised controlled trials, there is no consensus about whether the selective serotonin reuptake inhibitors or tricyclic and related antidepressants should be used as first line treatment of depression. DESIGN: Systematic review of all RCTs comparing selective serotonin reuptake inhibitors and tricyclic or heterocyclic antidepressants. MAIN RESULTS: The shortcomings identified in the 122 trials were as follows: (1) there was inadequate description of randomisation, (2) the outcomes used were mainly observer rated measurements of depression, and studies failed to use quality of life measures or perform economic evaluations, (3) doses of tricyclic antidepressants were inadequate, (4) generalisability of studies was poor (including a reliance on secondary care settings and inadequate follow up), and (5) there were statistical shortcomings such as low statistical power, failure to use intention to treat analyses, and the tendency to make multiple comparisons. CONCLUSIONS: Future RCTs should be designed to inform policy makers and address these methodological shortcomings.

Distinguishing between "no evidence of effect" and "evidence of no effect" in randomised controlled trials and other comparisons. William Odita Tarnow-Mordi, MJ Healy. Arch Dis Child 1999: 80(3); 210-213. Abstract not available.

Cost effectiveness calculations and sample size. David J Torgerson, Marion K Campbell. BMJ 2000: 321697. Abstract not available yet.

Elevated blood lead levels in children of construction workers. EA Whelan, GM Piacitelli, B Gerwel, TM Schnorr, CA Mueller, J Gittleman, TD Matte. American Journal of Public Health 1997: 87(8); 1352-55. ABSTRACT: OBJECTIVES: This study examined whether children of lead-exposed construction workers had higher blood lead levels than neighborhood control children. METHODS: Twenty-nine construction workers were identified from the New Jersey Adult Blood Lead Epidemiology and Surveillance (ABLES) registry. Eighteen control families were referred by workers. Venous blood samples were collected from 50 children (31 exposed, 19 control subjects) under age 6. RESULTS: Twenty-six percent of workers children had blood lead levels at or over the Centers for Disease Control and Prevention action level of 0.48 mumol/L (10 micrograms/dL), compared with 5% of control children (unadjusted odds ratio = 6.1; 95% confidence interval = 0.9, 147.2). CONCLUSIONS: Children of construction workers may be at risk for excessive lead exposure. Health care providers should assess parental occupation as a possible pathway for lead exposure of young children.

evidence >> mountain >> subgroup (14)

Subgroups, treatment effects, and baseline risks: some lessons from major cardiovascular trials. Parker AB, Naylor CD. American Heart Journal 2000: 139(6); 952-61. BACKGROUND: The objective of this study was to determine how subgroup analyses are performed in large randomized trials of cardiovascular pharmacotherapy. METHODS AND RESULTS: We reviewed 67 randomized, double-blind, controlled trials involving pharmacotherapy in at least 1000 patients with unstable angina, myocardial infarction, left ventricular dysfunction, or heart failure with clinical outcomes as primary end points, published between 1980 and 1997. Nine had no subgroup analyses but 43 reported on 5 or more subgroups and 31 reported subgroups without formal statistical tests for treatment-subgroup interactions. In most trials, a rationale for subgroup selection was missing. All but 6 focused on single-factor subgroups. CONCLUSIONS: Trial subgroups should ideally be defined a priori on 2 bases: single-factor subgroups with a strong rationale for biological response modification and multifactorial prognostic subgroups defined from baseline risks. However, single-factor subgroup analyses are often reported without a supporting rationale or formal statistical tests for interactions. We suggest that clinicians should interpret published subgroup-specific variations in treatment effects skeptically unless there is a prespecified rationale and a significant treatment-subgroup interaction. [Medline]

Randomised crossover trial of transdermal fentanyl and sustained release oral morphine for treating chronic non-cancer pain. L. Allan, H. Hays, N. H. Jensen, B. L. de Waroux, M. Bolt, R. Donald, E. Kalso. British Medical Journal 2001: 322(7295); 1154-8. OBJECTIVES: To compare patients' preference for transdermal fentanyl or sustained release oral morphine, their level of pain control, and their quality of life after treatment. DESIGN: Randomised, multicentre, international, open label, crossover trial. SETTING: 35 centres in Belgium, Canada, Denmark, Finland, the United Kingdom, the Netherlands, and South Africa. PARTICIPANTS: 256 patients (aged 26-82 years) with chronic non-cancer pain who had been treated with opioids. MAIN OUTCOME MEASURES: Patients' preference for transdermal fentanyl or sustained release oral morphine, pain control, quality of life, and safety assessments. Results: Of 212 patients, 138 (65%) preferred transdermal fentanyl, whereas 59 (28%) preferred sustained release oral morphine and 15 (7%) expressed no preference. Better pain relief was the main reason for preference for fentanyl given by 35% of patients. More patients considered pain control as being "good" or "very good" with fentanyl than with morphine (35% v 23%, P=0.002). These results were reflected in both patients' and investigators' opinions on the global efficacy of transdermal fentanyl. Patients receiving fentanyl had on average higher quality of life scores than those receiving morphine. The incidence of adverse events was similar in both treatment groups; however, more patients experienced constipation with morphine than with fentanyl (48% v 29%, P<0.001). Overall, 41% of patients experienced mild or moderate cutaneous problems associated with wearing the transdermal fentanyl patch, and more patients withdrew because of adverse events during treatment with fentanyl than with morphine (10% v 5%). However, within the subgroup of patients naive to both fentanyl and morphine, similar numbers of patients withdrew owing to adverse effects (11% v 10%, respectively). CONCLUSION: Transdermal fentanyl was preferred to sustained release oral morphine by patients with chronic non-cancer pain previously treated with opioids. The main reason for preference was better pain relief, achieved with less constipation and an enhanced quality of life. [Medline] [Abstract] [Full text] [PDF]

Analysis of clinical trial outcomes: some comments on subgroup analyses. M. E. Buyse. Controlled Clinical Trials 1989: 10(4 Suppl); 187S-194S. This article briefly discusses the various ways in which prognostic information can be included in the analysis of treatment effect in clinical trials. Adjustments in the treatment comparison are usually not warranted, as they do not substantially improve precision, but they may be useful, in addition to the unadjusted comparison, if a potent covariate is by chance maldistributed among the treatment groups. Estimation of interactions between treatment and covariates is usually plagued by insufficient statistical power. Estimation of treatment effect within individual subgroups is also subject to large random errors as well as to the problem of multiplicity, but with these caveats in mind it is an informative and needed complement to an analysis of overall treatment effect.

The miracle of DICE therapy for acute stroke: fact or fictional product of subgroup analysis? Carl E Counsell, Mike J Clarke, Jim Slattery, Peter A G Sandercock. British Medical Journal 1994: 309(6970); 1677-1681. ABSTRACT: OBJECTIVE--To determine whether inappropriate subgroup analysis together with chance could change the conclusion of a systematic review of several randomised trials of an ineffective treatment. DESIGN--44 randomised controlled trials of DICE therapy for stroke were performed (simulated by rolling different coloured dice; two trials per investigator). Each roll of the dice yielded the outcome (death or survival) for that "patient." Publication bias was also simulated. The results were combined in a systematic review. SETTING--Edinburgh. MAIN OUTCOME MEASURE--Mortality. RESULTS--The "hypothesis generating" trial suggested that DICE therapy provided complete protection against death from acute stroke. However, analysis of all the trials suggested a reduction of only 11% (SD 11) in the odds of death. A predefined subgroup analysis by colour of dice suggested that red dice therapy increased the odds by 9% (22). If the analysis excluded red dice trials and those of poor methodological quality the odds decreased by 22% (13, 2P = 0.09). Analysis of "published" trials showed a decrease of 23% (13, 2P = 0.07) while analysis of only those in which the trialist had become familiar with the intervention showed a decrease of 39% (17, 2P = 0.02). CONCLUSION--The early benefits of DICE therapy were not confirmed by subsequent trials. A plausible (but inappropriate) subset analysis of the effects of treatment led to the qualitatively different conclusion that DICE therapy reduced mortality, whereas in truth it was ineffective. Chance influences the outcome of clinical trials and systematic reviews of trials much more than many investigators realise, and its effects may lead to incorrect conclusions about the benefits of treatment.

Coronary Heart Disease and All-Causes Mortality in the Multiple Risk Factor Intervention Trial: Subgroup Findings and Comparisons with Other Trials. Jeffrey A Cutler, James D Neaton, Stephen B Hulley, Lewis Kuller, Oglesby Paul, Jeremiah Stamler. Preventive Medicine 1985: 14(3); 293-311. ABSTRACT: The Multiple Risk Factor Intervention Trial (MRFIT) was a randomized primary prevention trial to test whether a special intervention (SI) program to reduce high blood pressure, elevated blood cholesterol, and/or cigarette smoking could lower coronary heart disease (CHD) mortality in middle-aged men at above average risk. After an average follow-up period of 7 years, risk factor levels were reduced substantially more in SI than in usual-care (UC) men; however, SI-UC differences in CHD mortality (7.1%) and total mortality (-2.1%) were not statistically significant. Subgroup analysis showed 35% fewer SI than UC CHD deaths (P = 0.14) but no all-causes mortality difference in men comparable to the cohort having dietary and smoking intervention in Oslo (the "Oslo-like" cohort). Among the subgroup hypertensive at entry, SI-UC differences in CHD and total mortality appeared to be heterogeneous: CHD mortality was 24% lower in SI than UC men with a normal resting ECG at baseline, but 67% higher if baseline ECG abnormalities were present (nominal P = 0.02, for difference in relative risk estimates). These findings were supported by within-group analyses and are generally consistent with results of other CHD/cardiovascular disease prevention trials. They pose a new hypothesis about possible adverse effects of diuretic therapy in a minority of the hypertensive population.

Interpreting the results of secondary end points and subgroup analyses in clinical trials: should we lock the crazy aunt in the attic? N. Freemantle. British Medical Journal 2001: 322(7292); 989-91. [Medline] [Full text] [PDF]

Coronary Heart Disease Death, Nonfatal Acute Myocardial Infarction and Other Clinical Outcomes in the Multiple Risk Factor Intervention Trial. Multiple Risk Factor Intervention Trial Research Group. The American Journal of Cardiology 1986: 58(1); 1-13. ABSTRACT: The Multiple Risk Factor Intervention Trial was a randomized clinical study to test whether a special-intervention (SI) program aimed at reducing serum cholesterol levels, blood pressure and cigarette smoking would prevent coronary heart disease (CHD) in middle-aged men. The main endpoint reported here is the percentage of participants experiencing first major CHD events (either nonfatal acute myocardial infarction [AMI] or CHD death) during 7 years of follow-up. This outcome was slightly less frequent in the 6,428 SI men than in the 6,438 men assigned to their usual source of care (UC). However, the relative difference--either 1% (95% confidence interval -17% to 16%) or 8% (95% confidence interval -5% to 20%), depending on how AMI was classified--was not statistically significant. Regression analyses within the SI and UC groups suggested that the cholesterol and cigarette smoking interventions reduced the number of first major CHD events: the associations between lowering the levels of these 2 factors and reductions in CHD rates were significant (p less than 0.001) and of the anticipated magnitude. A similar analysis of antihypertensive treatment in the SI group revealed no favorable association between lowering blood pressure and CHD rate, and other subgroup comparisons suggested that a mixture of beneficial and adverse effects may underlie this finding. Thus, the nonsignificant overall UC/SI contrast in CHD rates may reflect a combination of the expected beneficial effects of the cholesterol and smoking interventions with unexpected heterogeneous effects of the antihypertensive intervention. Seven of 8 other prespecified cardiovascular endpoints occurred less frequently among SI than among UC men, the difference being nominally significant (p less than 0.05) for angina pectoris, congestive heart failure and peripheral arterial disease.

Supplemental Therapeutic Oxygen for Prethreshold Retinopathy of Prematurity (STOP-ROP), A Randomized, Controlled Trial. I: Primary Outcomes. The STOP-ROP Multicenter Study Group. Pediatrics 2000: 105(2); 295-310. ABSTRACT: OBJECTIVE: To determine the efficacy and safety of supplemental therapeutic oxygen for infants with prethreshold retinopathy of prematurity (ROP) to reduce the probability of progression to threshold ROP and the need for peripheral retinal ablation. METHODS: Premature infants with confirmed prethreshold ROP in at least 1 eye and median pulse oximetry <94% saturation were randomized to a conventional oxygen arm with pulse oximetry targeted at 89% to 94% saturation or a supplemental arm with pulse oximetry targeted at 96% to 99% saturation, for at least 2 weeks, and until both eyes were at study endpoints. Certified examiners masked to treatment assignment conducted weekly eye examinations until each study eye reached ophthalmic endpoint. An adverse ophthalmic endpoint for an infant was defined as reaching threshold criteria for laser or cryotherapy in at least 1 study eye. A favorable ophthalmic endpoint was regression of the ROP into zone III for at least 2 consecutive weekly examinations or full retinal vascularization. At 3 months after the due date of the infant, ophthalmic findings, pulmonary status, growth, and interim illnesses were again recorded. RESULTS: Six hundred forty-nine infants (325 conventional and 324 supplemental) were enrolled from 30 centers over 5 years. Five hundred ninety-seven (92.0%) infants attained known ophthalmic endpoints, and 600 (92%) completed the ophthalmic 3-month assessment. The rate of progression to threshold in at least 1 eye was 48% in the conventional arm and 41% in the supplemental arm. After adjustment for baseline ROP severity stratum, plus disease, race, and gestational age, the odds ratio (supplemental vs conventional) for progression was.72 (95% confidence interval:.52, 1.01). Final structural status of all study eyes at 3 months of corrected age showed similar rates of severe sequelae in both treatment arms: retinal detachments or folds (4.4% conventional vs 4.1% supplemental), and macular ectopia (3.9% conventional vs 3.9% supplemental). Within the prespecified ROP severity strata, ROP progression rates were lower with supplemental oxygen than with conventional oxygen, but the differences were not statistically significant. A post hoc subgroup analysis of plus disease (dilated and tortuous vessels in at least 2 quadrants of the posterior pole) suggested that infants without plus disease may be more responsive to supplemental therapy (46% progression in the conventional arm vs 32% in the supplemental arm) than infants with plus disease (52% progression in conventional vs 57% in supplemental). Pneumonia and/or exacerbations of chronic lung disease occurred in more infants in the supplemental arm (8.5% conventional vs 13.2% supplemental). Also, at 50 weeks of postmenstrual age, fewer conventional than supplemental infants remained hospitalized (6.8% vs 12.7%), on oxygen (37.0% vs 46.8%), and on diuretics (24.4% vs 35. 8%). Growth and developmental milestones did not differ between the 2 arms. CONCLUSIONS: Use of supplemental oxygen at pulse oximetry saturations of 96% to 99% did not cause additional progression of prethreshold ROP but also did not significantly reduce the number of infants requiring peripheral ablative surgery. A subgroup analysis suggested a benefit of supplemental oxygen among infants who have prethreshold ROP without plus disease, but this finding requires additional study. Supplemental oxygen increased the risk of adverse pulmonary events including pneumonia and/or exacerbations of chronic lung disease and the need for oxygen, diuretics, and hospitalization at 3 months of corrected age. Although the relative risk/benefit of supplemental oxygen for each infant must be individually considered, clinicians need no longer be concerned that supplemental oxygen, as used in this study, will exacerbate active prethreshold ROP.

Randomised, clinically controlled trial of intensive geriatric rehabilitation in patients with hip fracture: subgroup analysis of patients with dementia. T. M. Huusko, P. Karppi, V. Avikainen, H. Kautiainen, R. Sulkava. British Medical Journal 2000: 321(7269); p1107-11. OBJECTIVE: To evaluate the effect of intensive geriatric rehabilitation on demented patients with hip fracture. DESIGN: Preplanned subanalysis of randomised intervention study. Settting: Jyvaskyla Central Hospital, Finland. Participants: 243 independently living patients aged 65 years or older admitted to hospital with hip fracture. INTERVENTION: After surgery patients in the intervention group (n=120) were referred to the geriatric ward whereas those in the control group were discharged to local hospitals. MAIN OUTCOME MEASURES: Length of hospital stay, mortality, and place of residence three months and one year after surgery for hip fracture. RESULTS: The median length of hospital stay of hip fracture patients with moderate dementia (mini mental state examination score 12-17) was 47 days in the intervention group (n=24) and 147 days in the control group (n=12, P=0.04). The corresponding figures for patients with mild dementia (score 18-23) were 29 days in the intervention group (n=35) and 46.5 days in the control group (n=42, P=0.002). Three months after the operation, in the intervention group 91% (32) of the patients with mild dementia and 63% (15) of the patients with moderate dementia were living independently. In the control group, the corresponding figures were 67% (28) and 17% (2). There were no significant differences in mortality or in the lengths of hospital stay of severely demented patients and patients with normal mini mental state examination scores. CONCLUSIONS: Hip fracture patients with mild or moderate dementia can often return to the community if they are provided with active geriatric rehabilitation. [Medline] [Abstract] [Full text] [PDF]

Determination of who may derive most benefit from aspirin in primary prevention: subgroup results from a randomised controlled trial. T W Meade. British Medical Journal 2000: 32113-17. Abstract not available yet. [Medline] [Abstract] [Full text] [PDF]

A Consumer's Guide to Subgroup Analyses. Andrew D. Oxman, Gordon H. Guyatt. Annals of Internal Medicine 1992: 116(1); 78-84. ABSTRACT: The extent to which a clinician should believe and act on the results of subgroup analyses of data from randomized trials or meta-analyses is controversial. Guidelines are provided in this paper for making these decisions. The strength of inference regarding a proposed difference in treatment effect among subgroups is dependent on the magnitude of the difference, the statistical significance of the difference, whether the hypothesis preceded or followed the analysis, whether the subgroup analysis was one of a small number of hypotheses tested, whether the difference was suggested by comparisons within or between studies, the consistency of the difference, and the existence of indirect evidence that supports the difference. Application of these guidelines will assist clinicians in making decisions regarding whether to base a treatment decision on overall results or on the results of a subgroup analysis.

Misleading subgroup analyses in GISSI [letter]. R. Peto. Am J Cardiol 1990: 66(7); 771-2.

Repeated doses of porcine secretin in the treatment of autism: a randomized, placebo-controlled trial. W. Roberts, L. Weaver, J. Brian, S. Bryson, S. Emelianova, A. M. Griffiths, B. MacKinnon, C. Yim, J. Wolpin, G. Koren. Pediatrics 2001: 107(5); pE71. BACKGROUND AND OBJECTIVES: Anecdotal reports on the efficacy of secretin in autism raised great hopes for the treatment of children with this disorder. Initial single-dose, randomized, controlled trials failed to demonstrate any therapeutic effects of secretin. The present study is the first to test the outcome of repeated doses and to examine whether there is a subgroup of children who are more likely to achieve positive effects. METHOD: Sixty-four children with autism (ages 2-7 years; 55 boys and 9 girls) with a range of intelligence quotient and verbal ability were randomly assigned, in a double-blind manner, to secretin or placebo groups. Children received 2 doses of placebo or porcine secretin, 6 weeks apart. Assessments were performed at baseline and 3 weeks after each injection using several outcome measures. RESULTS: There were no group differences on formal measures of language, cognition, or autistic symptomatology. Subgroupings based on cognitive level, the presence or absence of diarrhea, or a history of regression failed to show any significant therapeutic effects of secretin. CONCLUSION: No evidence is provided for the efficacy of repeated doses of porcine secretin in the treatment of children with autism. The possible relationship between relief of biological symptoms and enhanced skill performance is discussed.

Analysis and Interpretation of Treatment Effects in Subgroups of Patients in Randomized clinical trials. S Yusuf. JAMA 1991: 266(1); 93-98. ABSTRACT: A key principle for interpretation of subgroup results is that quantitative interactions (differences in degree) are much more likely than qualitative interactions (differences in kind). Quantitative interactions are likely to be truly present whether or not they are apparent, whereas apparent qualitative interactions should generally be disbelieved as they have usually not been replicated consistently. Therefore, the overall trial result is usually a better guide to the direction of effect in subgroups than the apparent effect observed within a subgroup. Failure to specify prior hypotheses, to account for multiple comparisons, or to correct P values increases the chance of finding spurious subgroup effects. Conversely, inadequate sample size, classification of patients into the wrong subgroup, and low power of tests of interaction make finding true subgroup effects difficult. We recommend examining the architecture of the entire set of subgroups within a trial, analyzing similar subgroups across independent trials, and interpreting the evidence in the context of known biologic mechanisms and patient prognosis.

evidence >> mountain >> surogate (1)

The relationship between study design, results, and reporting of randomized clinical trials of HIV infection. J. P. Ioannidis, J. C. Cappelleri, H. S. Sacks, J. Lau. Control Clin Trials 1997: 18(5); 431-44. We examined whether the study design of randomized clinical trials for medications against human immunodeficiency virus (HIV) may affect the results and whether the outcomes of these trials affect reporting and publication. We used a database of 71 published randomized HIV-related drug efficacy trials and considered the following study design factors: endpoint definition and method of analysis, masked design, sample size, and duration of follow-up. Large variation was noted in the methods of analysis for surrogate endpoints. Often statistical significance for a surrogate endpoint was not associated with statistical significance for the clinical endpoint or for survival in the same trial, although disagreements in the direction of the treatment effect for surrogate endpoints and survival within individual trials were uncommon. Open-label design seemed to affect the magnitude of the treatment effect for two treatments. The magnitude of the treatment effect in trials of zidovudine monotherapy was inversely related to their sample size, but this probably reflected the confounding effect of longer duration of follow-up in large trials (with a resulting loss of efficacy) rather than publication bias. There was, however, evidence for potential bias in reporting and publication of HIV-related trials. Meta-analyses of published trials for specific treatments demonstrated a sizable treatment benefit for all the examined medications regardless of whether these medications were officially approved, controversial, or abandoned, raising concerns about either publication bias or unjustifiable rejection of potentially useful medications. Compared with trials published in specialized journals, trials published in journals of wide readership were larger (p = 0.001) and 4.4 times more likely to report "positive" results (p = 0.01). We identified several examples of trials with "negative" results that have remained unpublished for a long time. In conclusion, study design factors may have an impact on the magnitude and significance of the treatment effect in HIV-related trials. Bias in reporting can further affect the information that these studies provide.

evidence >> mountain >> surrogate (4)

The influence of semen analysis parameters on the fertility potential of infertile couples. C. Ayala, E. Steinberger, D. P. Smith. Journal of Andrology 1996: 17(6); 718-25. The objective of this study was to investigate the relationship between couples' fertility potential and several parameters of semen analysis (from a single semen sample/male partner) in a cohort of 1,055 infertile couples seen at the Texas Institute for Reproductive Medicine and Endocrinology for a total of 9,409 follow-up months. The medians of sperm concentrations (SC), total sperm counts (TSC), percent motility (MOT), motile sperm concentrations (MSC), and total motile sperm counts (TMSC) were significantly higher (P < 0.0001) in the group that achieved pregnancy. When the entire group was divided into "high" and "low" groups on the basis of the various parameters of semen analysis, the relative risk ratios for conception for the "high" groups were as follows: SC, 1.5; MOT, 8.5; TSC, 8.1; MSC, 5.8; and TMSC, 6.1. Life table analysis showed a statistically significant difference (P < 0.0001) in the initial rise and overall slope of the conception rates between the two groups for a number of the semen analysis parameters (TSC, MOT, MSC, and TMSC). This study showed that certain semen analysis parameters are positively correlated, with a high degree of statistical probability, with the time required for the occurrence of conception. The quantitative impact of the male fertility potential on conception rates was shown to correlate not solely with the SC or MOT values, but even more so with their derivatives (i.e., MSC and TMSC). Therefore, in an in vivo environment it is not only the number of sperm and their motility but also their derivatives that provide a quantitative insight into the male fertility potential. The data may provide a quantitative expression of the relative risk ratio for conception to occur and the time required until conception is achieved. Further studies will be necessary to clarify the effect of the other semen analysis parameters (i.e., morphology, velocity, linearity, and "efficient" MSC) on conception rates, cumulative conception rates, relative risk ratio for conception, and time until conception in a large population of infertile couples. [Medline]

Relation between tumour response to first-line chemotherapy and survival in advanced colorectal cancer: a meta-analysis. Meta-Analysis Group in Cancer. M. Buyse, P. Thirion, R. W. Carlson, T. Burzykowski, G. Molenberghs, P. Piedbois. Lancet 2000: 356(9227); 373-8. BACKGROUND: Treatment of advanced colorectal cancer has progressed substantially. However, improvements in response rates have not always translated into significant survival benefits. Doubts have therefore been raised about the usefulness of tumour response as a clinical endpoint. METHODS: This meta-analysis was done on individual data from 3791 patients enrolled in 25 randomised trials of first-line treatment with standard bolus intravenous fluoropyrimidines versus experimental treatments (fluorouracil plus leucovorin, fluorouracil plus methotrexate, fluorouracil continuous infusion, or hepatic-arterial infusion of floxuridine). Analyses were by intention to treat. FINDINGS: Compared with bolus fluoropyrimidines, experimental fluoropyrimidines led to significantly higher tumour response rates (454 responses among 2031 patients vs 209 among 1760; odds ratio 0.48 [95% CI 0.40-0.57], p<0.0001) and better survival (1808 deaths among 2031 vs 1580 among 1760; hazard ratio 0.90 [0.84-0.97], p=0.003). The survival benefits could be explained by the higher tumour response rates. However, a treatment that lowered the odds of failure to respond by 50% would be expected to decrease the odds of death by only 6%. In addition, less than half of the variability of the survival benefits in the 25 trials could be explained by the variability of the response benefits in these trials. INTERPRETATION: These analyses confirm that an increase in tumour response rate translates into an increase in overall survival for patients with advanced colorectal cancer. However, in the context of individual trials, knowledge that a treatment has benefits on tumour response does not allow accurate prediction of the ultimate benefit on survival.

Surrogate Endpoints and Neuromuscular Recovery. Aaron F. Kopman. Anesthesiology 1997: 87(5); 1027-1031. Abstract not available.

Maternal nutrition, pregnancy outcome and public health policy. M. S. Kramer. Cmaj 1998: 159(6); 663-5. ("From a clinical, etiologic or prognostic perspective, however, low birth weight is not a very useful outcome. Birth weight is a function of 2 factors: duration of gestation and rate of fetal growth. Thus, the weight of newborns can be low either because they are born early (preterm birth) or because they are small for their gestational age or both.") [Full text]

evidence >> mountain >> validity (19)

Differential recall bias and spurious associations in case/control studies. D. Barry. Statistics in Medicine 1996: 15(23); 2603-16. Consider a case/control study designed to investigate a possible association between exposure to a putative risk factor and development of a particular disease. Let E denote the information required to specify a subject's exposure to the risk factor. We examine the effect that errors in the recorded values of E (which we denote by E*) have on inferences of an association between disease and the risk factor. We concentrate on situations where the errors in recorded exposure are such that exposure is underestimated for controls and overestimated for cases. This phenomenon is referred to as differential recall bias and may lead to spurious inferences of an association between exposure and disease. We describe how the standard inferential techniques used in the analysis of data from case/control studies may be adjusted to take account of specified mechanisms whereby E is distorted to produce E*. Such adjustments may be used to determine the sensitivity of an analysis to the phenomenon of differential recall bias and to quantify the extent of such bias that would be required to overturn the conclusions of the analysis. There remains the matter of judging whether a given distortion mechanism is reasonable in a particular context. This emphasizes the need for investigators to take account of differential recall bias in validation studies of exposure assessment techniques. The methodology developed here is applied to a recent major study investigating the possible association between lung cancer and exposure to environmental tobacco smoke. The log-odds ratio of 0.23 based on recorded exposure differs significantly from 0 (P < 0.02). However, the association is rendered non-significant by a very modest degree of differential recall bias. For example, if 3.8 per cent of exposed controls report no exposure, 3.8 per cent of unexposed cases report exposure, and all other subjects report exposure accurately, the log-odds ratio drops to 0.07 and the corresponding p-value increases to 0.49.

Comparison of the Block and the Willett self-administered semiquantitative food frequency questionnaires with an interviewer-administered dietary history. BJ Caan, ML Slattery, J Potter, CP Jr Quesenberry, AO Coates, DM Schaffer. AJE 1998: 148(12); 1137-47. ABSTRACT: The performances of two commonly used diet instruments, the Block and the Willett food frequency questionnaires, were compared with a longer, interviewer-administered diet history. Participants in a case-control study on diet and colon cancer were interviewed between 1990 and 1994 in northern California, Utah, and Minnesota by trained nutritionists using a validated diet history. Two separate subsamples of participants were asked to complete either the Block or the Willett questionnaire exactly 5 days after they completed the original diet history. Data were analyzed separately by subsample comparing either the Block or the Willett questionnaire with the original diet history by using means, correlations, quintile agreement, and odds ratios for the relation between several nutrients and colon cancer. The Block and the Willett questionnaires generally provided lower absolute intake estimates than did the original diet history; however, the Block questionnaire underestimated more than did that by Willett. Both correlations and quintile agreement were slightly better for the Willett questionnaire than for that by Block when compared with the original diet history. In general, point estimates obtained from either the Block or the Willett questionnaire fell within the confidence intervals of the estimates of the odds ratios obtained from the original diet history, and no real difference in significance levels appeared. Although the Block and Willett questionnaires differed slightly from each other and from our original diet history in estimating absolute nutrients and ranking or classifying individuals, they were very similar in their ability to predict disease outcome.

The Forer effect (a.k.a. the P.T. Barnum effect and subjective validation). Robert Todd Carroll, The Skeptic's Dictionary. Accessed on 2003-03-10. "The Forer or Barnum effect is also known as the subjective validation effect or the personal validation effect. (The expression, "the Barnum effect," seems to have originated with psychologist Paul Meehl, in deference to circus man P.T. Barnum's reputation as a master psychological manipulator.) Psychologist B.R. Forer found that people tend to accept vague and general personality descriptions as uniquely applicable to themselves without realizing that the same description could be applied to just about anyone." A critical look at the validity and reliability of the Myers-Briggs Type Indicator. www.skepdic.com/myersb.html

Myers-Briggs Type Indicator®. Robert Todd Carroll, The Skeptic's Dictionary. Accessed on 2003-03-10. A critical look at the validity and reliability of the Myers-Briggs Type Indicator. www.skepdic.com/myersb.html

The Mozart Effect. Robert Todd Carroll, The Skeptic's Dictionary. Accessed on 2003-06-09. "The Mozart Effect is a term coined by Alfred A. Tomatis for the alleged increase in brain development that occurs in children under age 3 when they listen to the music of Wolfgang Amadeus Mozart." http://skepdic.com/mozart.html

The visual analogue pain intensity scale: what is moderate pain in millimetres? S. L. Collins, R. A. Moore, H. J. McQuay. Pain 1997: 72(1-2); 95-7. One way to ensure adequate sensitivity for analgesic trials is to test the intervention on patients who have established pain of moderate to severe intensity. The usual criterion is at least moderate pain on a categorical pain intensity scale. When visual analogue scales (VAS) are the only pain measure in trials we need to know what point on a VAS represents moderate pain, so that these trials can be included in meta-analysis when baseline pain of at least moderate intensity is an inclusion criterion. To investigate this we used individual patient data from 1080 patients from randomised controlled trials of various analgesics. Baseline pain was measured using a 4-point categorical pain intensity scale and a pain intensity VAS under identical conditions. The distribution of the VAS scores was examined for 736 patients reporting moderate pain and for 344 reporting severe pain. The VAS scores corresponding to moderate or severe pain were also examined by gender. Baseline VAS scores recorded by patients reporting moderate pain were significantly different from those of patients reporting severe pain. Of the patients reporting moderate pain 85% scored over 30 mm on the corresponding VAS, with a mean score of 49 mm. For those reporting severe pain 85% scored over 54 mm with a mean score of 75 mm. There was no difference between the corresponding VAS scores of men and women. Our results indicate that if a patient records a baseline VAS score in excess of 30 mm they would probably have recorded at least moderate pain on a 4-point categorical scale.

Underascertainment of child maltreatment fatalities by death certificates, 1990-1998. T. L. Crume, C. DiGuiseppi, T. Byers, A. P. Sirotnak, C. J. Garrett. Pediatrics 2002: 110(2 Pt 1); e18 (1 - 6). OBJECTIVE: Child fatality review teams have emerged across the United States in the past decade to address the concern that systems of child protection, law enforcement, criminal justice, and medicine do not adequately assess the circumstances surrounding child fatality as a result of maltreatment. METHODS: We compared data collected by a multidisciplinary child fatality review team with vital records for all children who were aged birth to 16 years and died in Colorado between January 1, 1990, and December 1, 1998. Odds ratios and 95% confidence intervals for ascertainment by the death certificate were estimated using logistic regression. RESULTS: Only half of the children who died as a result of maltreatment had death certificates that were coded consistently with maltreatment. Black race and female gender were associated with higher ascertainment, whereas death in a rural county was associated with lower ascertainment. Deaths resulting from violent causes (eg, shaking, blunt force trauma, striking) were more likely to be ascertained than those that involved acts of omission (eg, neglect and abandonment, drowning, fire). The most common perpetrators of maltreatment were parents. However, maltreatment by an unrelated perpetrator was 8.71 times (95% confidence interval: 3.52-21.55) more likely to be ascertained than maltreatment by a parent. CONCLUSIONS: The degree of underascertainment found in this study is of concern because most national estimates of child maltreatment fatality in the United States are derived from coding on death certificates. In addition, the patterns recognized in this study raise concern about systematic underascertainment that may affect children of specific sociodemographic groups.

Research Fables from the Sisters Grinn, No. 2. Snow White and the Seven Threats to Validity.. Jeanne Grace, University of Rochester School of Nursing. Accessed on 2003-05-27. "Once upon a time in the kingdom of Empiricism, there lived a gentle and good princess named Snow White. Since the death of her father, the wise king Imperial White, she had lived in the castle with her step-mother, Dingy Yellow, who claimed the throne*. Dingy Yellow was an exceedingly vain woman, who sought proof of her beauty from repeated measurements self-observations conducted by means of a talking mirror (reliability and validity unknown)." http://www.urmc.rochester.edu/SON/Fables/snowht.htm

Treatment of acute childhood diarrhea with homeopathic medicine: a randomized clinical trial in Nicaragua. J. Jacobs, L. M. Jimenez, S. S. Gloyd, J. L. Gale, D. Crothers. Pediatrics 1994: 93(5); 719-25. OBJECTIVE. Acute diarrhea is the leading cause of pediatric morbidity and mortality worldwide. Oral rehydration treatment can prevent death from dehydration, but does not reduce the duration of individual episodes. Homeopathic treatment for acute diarrhea is used in many parts of the world. This study was performed to determine whether homeopathy is useful in the treatment of acute childhood diarrhea. METHODOLOGY. A randomized double-blind clinical trial comparing homeopathic medicine with placebo in the treatment of acute childhood diarrhea was conducted in Leon, Nicaragua, in July 1991. Eighty-one children aged 6 months to 5 years of age were included in the study. An individualized homeopathic medicine was prescribed for each child and daily follow-up was performed for 5 days. Standard treatment with oral rehydration treatment was also given. RESULTS. The treatment group had a statistically significant (P < .05) decrease in duration of diarrhea, defined as the number of days until there were less than three unformed stools daily for 2 consecutive days. There was also a significant difference (P < .05) in the number of stools per day between the two groups after 72 hours of treatment. CONCLUSIONS. The statistically significant decrease in the duration of diarrhea in the treatment group suggests that homeopathic treatment might be useful in acute childhood diarrhea. Further study of this treatment deserves consideration.

Validation of the Cardiovascular Limitations and Symptoms Profile (CLASP) in chronic stable angina. R. J. Lewin, D. R. Thompson, C. R. Martin, N. Stuckey, J. Devlen, S. Michaelson, P. Maguire. J Cardiopulm Rehabil 2002: 22(3); 184-91. (CLASP score and subscales compared to: angina diary, Sickness Impact Profile (SIP), Hospital Anxiety and Depression Scale (HADS), Sleep Problems Questionnaire (SPQ), Exercise Tolerance Test on a programmable treadmill, number of coronary arteries compromised, number of years of recorded angina, number of previous acute events, type of drugs prescribed.) PURPOSE: This study aimed to establish the reliability, validity, and sensitivity of the Cardiovascular Limitations and Symptoms Profile (CLASP) in a group of patients with chronic stable angina. METHODS: After 226 patients with angina had been recruited, they were randomly allocated to one of three groups: a 10-week hospital-based angina management program (n = 75; men = 56; age = 60 +/- 8 years), routine care (n = 74; men = 52; age = 61 +/- 7 years), and exercise therapy (n = 77; men = 60; age = 60 +/- 7 years). All the patients were assessed with CLASP on two occasions: at baseline and at 10 weeks. The Sickness Impact Profile (SIP), the Hospital Anxiety and Depression Scale (HADS), and the Sleep Problems Questionnaire (SPQ) also were administered at the same time. RESULTS: Significant positive correlations between the actual number of angina episodes and the CLASP angina subscale scores (r =.60, P <.001) were observed. The CLASP subscale scores for shortness of breath (r = -.36; P <.001) and ankle swelling (r = -.24; P <.001) were significantly correlated with the total treadmill time. The CLASP tiredness subscale score showed a significant positive correlation with the SPQ score (r =.48; P <.001). The CLASP subscale scores were significantly correlated with their corresponding SIP subscale scores: the tiredness score with the sleep and rest score (r =.49; P <.001), the social and leisure score with the recreation and pastimes score (r =.41; P <.001), the home score with the home management score (r =.45; P <.001), and the mobility score with the mobility (r =.37; P <.001) and total treadmill time scores (r = -.49; P <.001). CONCLUSIONS: The findings show CLASP to be a reliable, valid, sensitive measure of health-related quality of life in patients with chronic stable angina. Before it can be recommended for all patients with heart disorders, similar data will be required from other diagnostic groups such as patients with heart failure or those who have sustained an acute myocardial infarction.

What's Wrong with This Picture? (Inkblot Test). Scott O Lilienfeld. Scientific American 2001: 81 -87. Not Available [PDF]

Psychological stress and cardiovascular disease: empirical demonstration of bias in a prospective observational study of Scottish men * Commentary: Psychosocial factors and health---strengthening the evidence base. John Macleod, George Davey Smith, Pauline Heslop, Chris Metcalfe, Douglas Carroll, Carole Hart, John Lynch. British Medical Journal 2002: 324(7348); 1247-. Objectives: To examine the association between self perceived psychological stress and cardiovascular disease in a population where stress was not associated with social disadvantage. Design: Prospective observational study with follow up of 21 years and repeat screening of half the cohort 5 years from baseline. Measures included perceived psychological stress, coronary risk factors, self reported angina, and ischaemia detected by electrocardiography. Setting: 27 workplaces in Scotland. Participants: 5606 men (mean age 48 years) at first screening and 2623 men at second screening with complete data on all measures Main outcome measures: Prevalence of angina and ischaemia at baseline, odds ratio for incident angina and ischaemia at second screening, rate ratios for cause specific hospital admission, and hazard ratios for cause specific mortality. Results: Both prevalence and incidence of angina increased with increasing perceived stress (fully adjusted odds ratio for incident angina, high versus low stress 2.66, 95% confidence interval 1.61 to 4.41; P for trend <0.001). Prevalence and incidence of ischaemia showed weak trends in the opposite direction. High stress was associated with a higher rate of admissions to hospital generally and for admissions related to cardiovascular disease and psychiatric disorders (fully adjusted rate ratios for any general hospital admission 1.13, 1.01 to 1.27, cardiovascular disease 1.20, 1.00 to 1.45, and psychiatric disorders 2.34, 1.41 to 3.91). High stress was not associated with increased admission for coronary heart disease (1.00, 0.76-1.32) and showed an inverse relation with all cause mortality, mortality from cardiovascular disease, and mortality from coronary heart disease, that was attenuated by adjustment for occupational class (fully adjusted hazard ratio for all cause mortality 0.94, 0.81 to 1.11, cardiovascular mortality 0.91, 0.78 to 1.06, and mortality from coronary heart disease 0.98, 0.75 to 1.27). Conclusions: The relation between higher stress, angina, and some categories of hospital admissions probably resulted from the tendency of participants reporting higher stress to also report more symptoms. The lack of a corresponding relation with objective indices of heart disease suggests that these symptoms did not reflect physical disease. The data suggest that associations between psychosocial measures and disease outcomes reported from some other studies may be spurious. [Abstract] [Full text] [PDF]

Depth of sedation in children undergoing computed tomography: validity and reliability of the University of Michigan Sedation Scale (UMSS). S. Malviya, T. Voepel-Lewis, A. R. Tait, S. Merkel, K. Tremper, N. Naughton. Br J Anaesth 2002: 88(2); 241-5. BACKGROUND: Safe care of sedated children requires ongoing assessment of the depth of sedation to permit early recognition of progression to over-sedation. This study evaluated the validity and reliability of the University of Michigan Sedation Scale (UMSS) as a measure of sedation during procedures. The UMSS is a simple observational tool that assesses the level of alertness on a five-point scale ranging from 1 (wide awake) to 5 (unarousable with deep stimulation). METHODS: Thirty-two children aged 4 months to 5 yr (mean 1.5 yr), sedated for computed tomography (CT), were studied prospectively. The CT nurse assessed sedation using the UMSS before sedative administration and every 10 min thereafter. The child was videotaped during each assessment, and segments were edited and their order was randomized. Four nurses blinded to sedative administration viewed the segments and scored sedation using the UMSS. One of these nurses also scored sedation using a visual analogue scale (VAS) and another using the Observer's Assessment of Alertness/Sedation Scale (OAAS). To examine the test-retest reliability, 75 randomly selected video segments were viewed and scored on a second occasion. RESULTS: Changes in scores from baseline to discharge supported construct validity (P<0.0001). Criterion validity was demonstrated by significant correlations between the UMSS and the VAS and OAAS. There was good interobserver agreement between blinded observers' scores for each level of sedation and at discharge, and between blinded observers and the CT nurse for scores of 0 and 1 (lighter levels of sedation), but less agreement for scores 2 and 3 (deeper sedation) and discharge scores. Test-retest reliability was supported by agreement in the observers' UMSS scores. CONCLUSION: The UMSS is a simple, valid and reliable tool that facilitates rapid and frequent assessment and documentation of depth of sedation in children.

Reliability of death certificate diagnoses. M. A. Moussa, M. Z. Shafie, M. M. Khogali, A. M. el-Sayed, T. N. Sugathan, G. Cherian, A. Z. Abdel-Khalik, M. T. Garada, D. Verma. J Clin Epidemiol 1990: 43(12); 1285-95. Consistency between death certificates and clinical records from 5 general hospitals in Kuwait was studied for 470 deaths with the following underlying or associated causes: hypertensive (HYP), ischaemic heart diseases (IHD), cerebrovascular diseases (CVD) and diabetes mellitus (DM). Direct causes were not considered since they are of little interest analytically. Only deaths with definite or most probable ascertainment were included. One cardiologist, who was provided with the WHO criteria and relevant documents on death certification, independently reviewed the records. To test the reviewer's bias and the reliability of his judgement, an adjudication process was effected by having one senior cardiologist re-review a random subsample of 140 records. The two reviewers showed good agreement. Specific diagnoses criteria for deciding the underlying cause of death in multiple morbid conditions by the reviewer were followed. Due to possible reviewer bias, we aimed at measuring the difference between initial certifiers and the reviewer rather than measuring the diagnostic accuracy of initial certifiers in reference to the reviewer. The agreement index kappa showed poor agreement between original and revised certificates. The original certificates under-estimated CVD as an underlying cause of death by 69.2%, DM by 60%, IHD by 33.5% and HYP by 31.8% in our sample. Associated causes were also consistently under-estimated by initial certifiers as compared with the reviewer. This bias calls for basing mortality statistics in Kuwait on hospital death committees' reports rather than on initial certifier death certificates, use of multiple-causes of death instead of one underlying cause and adequate training of the medical profession on the value and process of death certification.

Reporting on quality of life in randomised controlled trials: bibliographic study. C. Sanders, M. Egger, J. Donovan, D. Tallon, S. Frankel. Bmj 1998: 317(7167); 1191-4. OBJECTIVES: To examine the frequency and quality of reporting on quality of life in randomised controlled trials. DESIGN: Search of the Cochrane Controlled Trials Register 1980 to 1997 to identify trials from all disciplines, from oncology, and from cardiovascular medicine that reported on quality of life. Assessment of abstracts from articles published from 1993 to 1996. Assessment of a sample of full reports with a standardised instrument. MAIN OUTCOME MEASURES: Prevalence of reporting on quality of life. Conditions and interventions studied in trials reporting on quality of life. Quality of reporting on quality of life. RESULTS: During 1980-97 reporting on quality of life increased from 0.63% to 4.2% for trials from all disciplines, from 1.5% to 8.2% for cancer trials, and from 0.34% to 3.6% for cardiovascular trials. Of 364 abstracts, 65% reported on drug interventions. Of a sample of 67 full reports, authors of 48 (72%) used 62 established quality of life instruments. In 15 reports (22%) authors developed their own measures, and in 2 (3%) methods were unclear. Response rates were given in 38 (57%), and complete reporting on all items and scales occurred in 31 (46%).CONCLUSIONS: Less than 5% of all randomised controlled trials reported on quality of life, and this proportion was below 10% even for cancer trials. A plethora of instruments was used in different studies, and the reporting of methods and results was often inadequate. Standards for the measurement and reporting of quality of life in clinical trials research need to be developed. [Medline] [Abstract] [Full text] [PDF]

Misclassification rates for current smokers misclassified as nonsmokers. AJ Wells, PB English, SF Posner, LE Wagenknecht, EJ Perez-Stable. American Journal of Public Health 1998: 88(10); 1503-09. ABSTRACT: OBJECTIVES: This paper provides misclassification rates for current cigarette smokers who report themselves as nonsmokers. Such rates are important in determining smoker misclassification bias in the estimation of relative risks in passive smoking studies. METHODS: True smoking status, either occasional or regular, was determined for individual current smokers in 3 existing studies of nonsmokers by inspecting the cotinine levels of body fluids. The new data, combined with an approximately equal amount in the 1992 Environmental Protection Agency (EPA) report on passive smoking and lung cancer, yielded misclassification rates that not only had lower standard errors but also were stratified by sex and US minority majority status. RESULTS: The misclassification rates for the important category of female smokers misclassified as never smokers were, respectively, 0.8%, 6.0%, 2.8%, and 15.3% for majority regular, majority occasional, US minority regular, and US minority occasional smokers. Misclassification rates for males were mostly somewhat higher. CONCLUSIONS: The new information supports EPA's conclusion that smoker misclassification bias is small. Also, investigators are advised to pay attention to minority/majority status of cohorts when correcting for smoker misclassification bias.

Reporting accuracy among mothers of malformed and nonmalformed infants. M. M. Werler, B. R. Pober, K. Nelson, L. B. Holmes. Am J Epidemiol 1989: 129(2); p415-21. The potential for recall bias in case-control studies is a common concern. The authors assessed whether recall bias was present in exposure information reported at postpartum interview by mothers of malformed and nonmalformed infants who delivered at Brigham and Women's Hospital, Boston, during 1984. Accuracy of exposure reporting was measured by comparing interview data with exposure information documented during pregnancy in obstetric records. The authors' measure of recall bias, relative sensitivity (RS), is the ratio of exposure-reporting accuracy for mothers of malformed infants to that of mothers of nonmalformed infants. Relative sensitivity estimates that are greater than 1.0 indicate that mothers of malformed infants are more accurate reporters than mothers of nonmalformed infants. Relative sensitivity was estimated for eight exposure factors: antibiotic or antifungal drug use (RS = 1.2), urinary tract or yeast infection (RS = 2.7), history of infertility (RS = 1.4), use of birth control after conception (RS = 7.6), elective abortion history (RS = 1.1), any over-the-counter drug use (RS = 1.0), spotting or bleeding (RS = 1.2), and nausea or vomiting (RS = 0.8) These data suggest the presence of recall bias for some exposure factors. The authors advise the use of malformed controls to reduce potential recall bias in case-control studies of selected malformations and many etiologic factors.

Comparison of food frequency questionnaires: the reduced block and Willett questionnaires differ in ranking on nutrient intakes. AK Wirfait, RW Jeffery, PJ Elmer. AJE 1998: 148(12); 1148-56. ABSTRACT: Food frequency questionnaires, major tools in epidemiologic studies, are often criticized for biased and imprecise intake estimates. The aim of this study was to compare the performance of two widely used food frequency questionnaires, a reduced 60-item Block questionnaire and a 153-item Willett food frequency questionnaire, relative to three 24-hour recalls administered by telephone. The dietary data were collected in 1991 from a group of healthy women age 25-49 years (n=101) during the baseline period of a weight-loss intervention study in Minneapolis, Minnesota. Total energy and macro- and micronutrient intakes were compared across methods by using four analytic approaches: comparison of means and correlation coefficients, regression analysis, and estimation of percent agreement between each questionnaire and recalls. The Block instrument showed an overall underestimation bias, but was more successful in categorizing individuals on percent energy from fat and carbohydrate intakes than was the Willett instrument. The Willett instrument showed no overall underestimation bias and was more successful in classifying individuals on vitamin A and calcium intakes. Diverging performance characteristics of diet assessment methods have an implication for the design of studies, interpretation of results, and comparison of findings across studies.

Reporting on quality of life in RCTs. Susan P. Wright. British Medical Journal 1999: 318(7191); 1142. [Full text]

 

=====

Equivalence

Sample size determination for proving equivalence based on the ratio of two means for normally distributed data. D Hauschke, M Kieser, E Diletti, M Burke. Stats in Medicine 1999: 18(1); 93-105. ABSTRACT: Equivalence trials aim to demonstrate that two treatments do not differ by more than a prespecified clinically irrelevant amount. We consider the problem when equivalence is defined in terms of the ratio of population means and the original (untransformed) data are normally distributed. Application of the intersection-union principle to the test proposed by Sasabuchi results in a two one-sided tests procedure of size alpha. We give the associated 100 (1-2 alpha) per cent confidence interval and derive the exact methods for calculation of power and sample sizes for the parallel group design and the two-period cross-over. We present tables and figures of required sample sizes and achieved power.

Bioequivalence of generic and brand-name levothyroxine products in the treatment of hypothyroidism. B. J. Dong, W. W. Hauck, J. G. Gambertoglio, L. Gee, J. R. White, J. L. Bubp, F. S. Greenspan. Jama 1997: 277(15); 1205-13. OBJECTIVE: To compare relative bioavailability of Synthroid, Levoxine (Levoxine has been renamed Levoxyl), and 2 generic levothyroxine sodium preparations. DESIGN: Single-blind (primary investigators blinded), randomized, 4-way crossover trial. SETTING: Ambulatory care. PATIENTS: Twenty-two women with hypothyroidism who were clinically and chemically euthyroid and were receiving levothyroxine sodium, 0.1 or 0.15 mg. INTERVENTIONS: All patients received each of the 4 levothyroxine products for 6-week periods in the same dosage as their prestudy regimen with no washout period. The order of the drug sequences was randomly determined before study initiation. MAIN OUTCOME MEASURES: Area under the curve, time to peak serum concentrations, and peak serum concentrations of thyroxine, triiodothyronine, and free thyroxine index for all 4 products. RESULTS: All data analyses were completed prior to unblinding of the product codes. No significant differences between the 4 products were found in area under the curve or peak serum concentrations of total thyroxine, total triiodothyronine, or free thyroxine index. Although Synthroid produced a more rapid rise in total serum triiodothyronine concentration and a higher total peak serum triiodothyronine concentration than the other products, these differences were not statistically significant (P=.08). The Food and Drug Administration criterion for relative bioequivalence within 90% confidence intervals (0.8-1.25) was demonstrated (P<.05) for all pairs of products. Relative bioequivalence of 0.95 to 1.07 was demonstrated, tighter than the current bioequivalence criterion for oral formulations. CONCLUSIONS: The 4 generic and brand-name levothyroxine preparations studied are different but are bioequivalent by current Food and Drug Administration criteria and are interchangeable in the majority of patients receiving thyroxine replacement therapy. Further investigation is required to determine whether our results are equally applicable to all existing levothyroxine preparations.

"Proving the null hypothesis" in clinical trials. W. C. Blackwelder. Controlled Clinical Trials 1982: 3(4); 345-53. When designing a clinical trial to show whether a new or experimental therapy is as effective as a standard therapy (but not necessarily more effective), the usual null hypothesis of equality is inappropriate and leads to logical difficulties. Since therapies cannot be shown to be literally equivalent, the appropriate null hypothesis is that the standard therapy is more effective than the experimental therapy by at least some specified amount. The problem is presented in terms of a trial in which the outcome of interest is dichotomous; test statistics, confidence intervals, and sample size calculations are discussed. The required sample size may be larger for either null hypothesis formulation than for the other, depending on the specific assumptions made. Reporting results in terms of confidence intervals is especially useful for this type of trial.

Scientific and ethical issues in equivalence trials. B. Djulbegovic, M. Clarke. Jama 2001: 285(9); 1206-8.

Trials to assess equivalence: the importance of rigorous methods. B Jones, P Jarvis, J A Lewis, A F Ebbutt. British Medical Journal 1996: 313(7048); 36-39. The aim of an equivalence trial is to show the therapeutic equivalence of two treatments, usually a new drug under development and an existing drug for the same disease used as a standard active comparator. Unfortunately the principles that govern the design, conduct, and analysis of equivalence trials are not as well understood as they should be. Consequently such trials often include too few patients or have intrinsic design biases which tend towards the conclusion of no difference. In addition the application of hypothesis testing in analysing and interpreting data from such trials sometimes compounds the drawing of inappropriate conclusions, and the inclusion and exclusion of patients from analysis may be poorly managed. The design of equivalence trials should mirror that of earlier successful trials of the active comparator as closely as possible. Patient losses and other deviations from the protocol should be minimised; analysis strategies to deal with unavoidable problems should not centre on an "intention to treat" analysis but should seek to show the similarity of results from a range of approaches. Analysis should be based on confidence intervals, and this also carries implications for the estimation of the required numbers of patients at the design stage. [Medline] [Full text]

Comparison of tests and sample size formulae for proving therapeutic equivalence based on the difference of binomial probabilities. Peter Roebruck. Statistics in Medicine 1995: 141583-94. Abstract not available yet.

Equivalence trials. JH Ware, EM Antman. NEJM 1997: 337(16); 1159-61. [Abstract]

Overview

Bias. Bandolier. Accessed on 2003-03-25. "Bandolier has been struck of late, 'many a time and oft', by the continuing and cavalier attitude towards bias in clinical trials. We know that the way that clinical trials are designed and conducted can influence their results. Yet people still ignore known sources of bias when making decisions about treatments at all levels." www.jr2.ox.ac.uk/bandolier/band80/b80-2.html

This webpage was written by Steve Simon on (unknown date), edited by Steve Simon, and was last modified on 2008-07-08. Send feedback to ssimon at cmh dot edu or click on the email link at the top of the page. Category: Statistical evidence