Children's Mercy Hospital
Find a Doctor | Press Room | Careers | Directions & Locations

About Us | Contact Us | Giving to Children's Mercy
For Patients and Families   Your Child's Health   Clinical Services   |   For Health Care Professionals   Medical Education   Medical Research

Resources for "Apples or Oranges"

This page lists some references that I have used to develop Chapter 1 of "Statistical Evidence."

Adjustment for covariates

Conditions for confounding of the risk ratio and of the odds ratio. J. F. Boivin, S. Wacholder. American Journal Epidemiology 1985: 121(1); 152-8. There are disagreements in the literature about the criteria to be used to ascertain whether or not a measure of association is confounded. The authors postulate the general principle that a crude unconfounded measure of association is structured as a weighted average of the stratum-specific values of the measure. They examine the relationships between stratum-specific measures of association, crude overall measures, and weighted averages of stratum-specific measures, and indicate how these relationships may be used to define criteria for the assessment of confounding in cohort studies in which the exposure, disease, and stratification variables are classified dichotomously. The criteria presented differ for the risk ratio and for the disease-odds ratio. In other words, one can reach different conclusions about the confounding effect of a given extraneous variable, depending on which measure of association is chosen. This view differs from that of Miettinen and Cook (Confounding: essence and detection. Am J Epidemiol 1981;114:593-603) who postulated one set of criteria for the assessment of confounding, which was applicable to both measures of association. These different approaches may lead to different conclusions about the presence or absence of confounding. [Medline]

Are sex and death related? Study failed to adjust for an important confounder [letter; comment]. David Batty. British Medical Journal 1998: 316(7145); 1671; discussion 1672. Abstract not available. [Full text]

Maternal smoking and Down syndrome: the confounding effect of maternal age. C. L. Chen, T. J. Gilbert, J. R. Daling. Am J Epidemiol 1999: 149(5); 442-6. Inconsistent results have been reported from studies evaluating the association of maternal smoking with birth of a Down syndrome child. Control of known risk factors, particularly maternal age, has also varied across studies. By using a population-based case-control design (775 Down syndrome cases and 7,750 normal controls) and Washington State birth record data for 1984-1994, the authors examined this hypothesized association and found a crude odds ratio of 0.80 (95% confidence interval 0.65-0.98). Controlling for broad categories of maternal age (<35 years, > or =35 years), as described in prior studies, resulted in a negative association (odds ratio = 0.87, 95% confidence interval 0.71-1.07). However, controlling for exact year of maternal age in conjunction with race and parity resulted in no association (odds ratio = 1.00, 95% confidence interval 0.82-1.24). In this study, the prevalence of Down syndrome births increased with increasing maternal age, whereas among controls the reported prevalence of smoking during pregnancy decreased with increasing maternal age. There is a substantial potential for residual confounding by maternal age in studies of maternal smoking and Down syndrome. After adequately controlling for maternal age in this study, the authors found no clear relation between maternal smoking and the risk of Down syndrome.

Look before You Leap: Stratify before You Standardize. Bernard C.K. Choi. American Journal of Epidemiology 1999: 149(12); 1087-1095. ABSTRACT: This paper presents a mathematical model to show the conditions in which age standardization can be used to summarize age-specific rates for comparison purposes over calendar time. It shows that the conditions for valid comparison depend on the type of measure used for comparison, that is, difference, ratio, or percent change. If the measure for comparison is a difference of the standardized rates at two time points, then the age-specific rates need to maintain a constant rate difference over time for the comparison to be valid. If the measure for comparison is a ratio or percent change of the standardized rates at two time points, then the age-specific rates need to maintain a constant rate ratio over time for the comparison to be valid. Since in reality, as shown by our Canadian empirical data, age-specific rates do not always maintain a consistent pattern over time, it is recommended that one should always stratify the data to look at patterns of age-specific rates before applying age standardization.

Causal Knowledge as a Prerequisite for Confounding Evaluation: An Application to Birth Defects Epidemiology. Miguel A. Hernán, Sonia Hernández-Díaz2, Martha M. Werler2 and Allen A. Mitchell2. Am. J of Epidemiology 2002: 155(2); 176-184. Common strategies to decide whether a variable is a confounder that should be adjusted for in the analysis rely mostly on statistical criteria. The authors present findings from the Slone Epidemiology Unit Birth Defects Study, 1992–1997, a case-control study on folic acid supplementation and risk of neural tube defects. When statistical strategies for confounding evaluation are used, the adjusted odds ratio is 0.80 (95% confidence interval: 0.62, 1.21). However, the consideration of a priori causal knowledge suggests that the crude odds ratio of 0.65 (95% confidence interval: 0.46, 0.94) should be used because the adjusted odds ratio is invalid. Causal diagrams are used to encode qualitative a priori subject matter knowledge.

Socioeconomic status and health in blacks and whites: the problem of residual confounding and the resiliency of race. J. S. Kaufman, R. S. Cooper, D. L. McGee. Epidemiology 1997: 8(6); 621-8. A large number of epidemiologic studies have focused on racial/ethnic differences, particularly between blacks and whites. Because health endpoints and racial categorizations are associated with socioeconomic status, investigators generally adjust for socioeconomic indicators. The intention is usually to control for confounding, thereby making groups comparable and excluding socioeconomic status as an alternative explanation to hypotheses of innate physiologic differences. A threat to the validity of these analyses is therefore the presence of residual confounding. We identify four potential sources of residual confounding in this analytical design: categorization of socioeconomic status variables, measurement error in socioeconomic indicators, use of aggregated socioeconomic status measures, and incommensurate socioeconomic indicators. Using simulations and examples from the literature, we demonstrate that the effect of residual confounding is to bias interpretation of data toward the conclusion of independent racial/ethnic group effects. Investigators often refer to possible "genetic" differences on the basis of models that control for socioeconomic status. We propose that such conclusions on the basis of this analytical strategy are generally unwarranted. Racial/ethnic differences in disease are a pressing public health concern, but the current approach does not often provide a basis for inference about putative biological factors in the etiology of this disparity.

META-ANALYSIS Dose-specific Meta-Analysis and Sensitivity Analysis of the Relation between Alcohol Consumption and Lung Cancer Risk. Jeffrey E. Korte, Paul Brennan, S. Jane Henley, Paolo Boffetta. Am. J of Epidemiology 2002: 155(6); 496-506. Alcohol drinking increases the risk of several types of cancer, but studies of the relation between alcohol and lung cancer risk are complicated by smoking. The authors carried out meta-analyses for four study designs and conducted sensitivity analyses to assess the results. Pooled smoking-unadjusted relative risks (RRs) for brewery workers and alcoholics were 1.17 (95% confidence interval (CI): 0.99, 1.39) and 1.99 (95% CI: 1.66, 2.39), respectively, relative to population rates. For cohort and case-control studies, the authors conducted dose-specific meta-analyses for ethanol consumption of 1–499, 500–999, 1,000–1,999, and 2,000 g/month, relative to nondrinking. Smoking-adjusted RRs for ascending dose groups in cohort studies were 0.98 (95% CI: 0.79, 1.21), 0.92 (95% CI: 0.81, 1.04), 1.04 (95% CI: 0.88, 1.22), and 1.53 (95% CI: 1.04, 2.25), respectively. Smoking-adjusted odds ratios for ascending groups in case-control studies were 0.63 (95% CI: 0.51, 0.78), 1.30 (95% CI: 0.98, 1.70), 1.13 (95% CI: 0.46, 2.75), and 1.86 (95% CI: 1.39, 2.49), respectively. Elevated odds ratios were seen for hospital-based case-control studies but not for population-based case-control studies. Sensitivity analyses indicated that smoking explained the elevated RRs in studies of alcoholics and that strong misclassification of smoking status could produce an elevated smoking-adjusted RR in cohort and case-control studies. Overall, evidence for a smoking-adjusted association between alcohol and lung cancer risk is limited to very high consumption groups in cohort and hospital-based case-control studies. At lower levels, any associations observed appear to be explained by confounding.

How do risk factors work together? Mediators, moderators, and independent, overlapping, and proxy risk factors. H. C. Kraemer, E. Stice, A. Kazdin, D. Offord, D. Kupfer. Am J Psychiatry 2001: 158(6); 848-56. OBJECTIVE: The authors developed a methodological basis for investigating how risk factors work together. Better methods are needed for understanding the etiology of disorders, such as psychiatric syndromes, that presumably are the result of complex causal chains. METHOD: Approaches from psychology, epidemiology, clinical trials, and basic sciences were synthesized. RESULTS: The authors define conceptually and operationally five different clinically important ways in which two risk factors may work together to influence an outcome: as proxy, overlapping, and independent risk factors and as mediators and moderators. CONCLUSIONS: Classifying putative risk factors into these qualitatively different types can help identify high-risk individuals in need of preventive interventions and can help inform the content of such interventions. These methods may also help bridge the gaps between theory, the basic and clinical sciences, and clinical and policy applications and thus aid the search for early diagnoses and for highly effective preventive and treatment interventions.

Mediators and moderators of treatment effects in randomized clinical trials. H. C. Kraemer, G. T. Wilson, C. G. Fairburn, W. S. Agras. Arch Gen Psychiatry 2002: 59(10); 877-83. (Covariate adjustment is important, even in randomized trials and can identify important subgroups and mechanisms of action.) Randomized clinical trials (RCTs) not only are the gold standard for evaluating the efficacy and effectiveness of psychiatric treatments but also can be valuable in revealing moderators and mediators of therapeutic change. Conceptually, moderators identify on whom and under what circumstances treatments have different effects. Mediators identify why and how treatments have effects. We describe an analytic framework to identify and distinguish between moderators and mediators in RCTs when outcomes are measured dimensionally. Rapid progress in identifying the most effective treatments and understanding on whom treatments work and do not work and why treatments work or do not work depends on efforts to identify moderators and mediators of treatment outcome. We recommend that RCTs routinely include and report such analyses.

Clinical trials in acute myocardial infarction: Should we adjust for baseline characteristics? Ewout W. Steyerberg, Patrick M.M. Bossuyt, Kerry L. Lee. American Heart Journal 2000: 139(5); 745-751. ABSTRACT: BACKGROUND: Clinical trials concerning acute myocardial infarction often evaluate short-term death. Several baseline characteristics are predictors of death, most notably age. Adjustment for one or more predictors in a multivariable analysis may be considered to correct the estimate of the treatment effect for any imbalance that by chance may have occurred between the randomized groups. Moreover, adjustment results in a stratified estimate of the effect of treatment. METHODS AND RESULTS: The effects of adjustment (correction for imbalance and stratification) were studied with logistic regression analysis in the Global Use of Strategies to Open Occluded Coronary Arteries (GUSTO)-I trial. The primary end point was 30-day death, which occurred in 6.3% of 10,348 patients randomly assigned to tissue plasminogen activator and 7.3% of 20,162 patients randomly assigned to streptokinase thrombolytic therapy. This is equivalent to an unadjusted odds ratio of 0.853. No significant imbalance had occurred for any of 17 baseline characteristics considered, including well-known demographic, presenting, and history characteristics. Adjusted for age, the odds ratio was 0.829, which is an 18% increase in estimated effect on the logistic scale. When adjusted for 17 characteristics, the odds ratio was 0.820, an increase of 25%. The increase in effect estimate was largely explained by the stratification effect and only partly by imbalance of predictors. CONCLUSIONS: Adjustment for predictive baseline characteristics, even when largely balanced, may lead to clearly different estimates of the treatment effect on mortality rates. Adjustment for important predictors such as age is recommended in clinical trials studying patients with acute myocardial infarction.

Research Methods: Why Covariance? A Rationale for Using Analysis of Covariance Procedures in Randomized Studies. Matthew J. Taylor. Journal of Early Intervention 1993: 17(4); 455-466. Abstract not available yet.

A comparison of direct adjustment and regression adjustment of epidemiologic measures. T. C. Wilcosky, L. E. Chambless. J Chronic Dis 1985: 38(10); 849-56. Although regression adjustment can provide a useful alternative to direct adjustment, especially when data are sparse, many researchers are unaware that adjusted summary measures can be easily derived from regression coefficients. In a non-technical discussion with examples, the direct adjustment procedure is compared with three methods of regression adjustment based on analysis of covariance models: the conditional prediction method, the stratified prediction method, and the marginal prediction method. Both the stratified prediction and direct adjustment methods yield summary measures that are weighted averages of stratum-specific measures, while adjusted measures from the conditional prediction method are similar to stratum-specific estimates. In contrast to the other adjustment procedures, which can use internal or external weights, the marginal prediction method always gives an internally adjusted measure. Under certain conditions, the three regression adjustment procedures produce identical results. Major advantages of direct adjustment include computational simplicity and relatively few statistical assumptions. Regression adjustment, however, is more convenient for statistical tests for interactions and group differences, and often precludes the need to categorize continuous variables, so that problems with empty strata are avoided.

Case control design

Reye's syndrome in the United States from 1981 through 1997. E. D. Belay, J. S. Bresee, R. C. Holman, A. S. Khan, A. Shahriari, L. B. Schonberger. New England Journal of Medicine 1999: 340(18); 1377-82. BACKGROUND: Reye's syndrome is characterized by encephalopathy and fatty degeneration of the liver, usually after influenza or varicella. Beginning in 1980, warnings were issued about the use of salicylates in children with those viral infections because of the risk of Reye's syndrome. METHODS: To describe the pattern of Reye's syndrome in the United States, characteristics of the patients, and risk factors for poor outcomes, we analyzed national surveillance data collected from December 1980 through November 1997. The surveillance system is based on voluntary reporting with the use of a standard case-report form. RESULTS: From December 1980 through November 1997 (surveillance years 1981 through 1997), 1207 cases of Reye's syndrome were reported in patients less than 18 years of age. Among those for whom data on race and sex were available, 93 percent were white and 52 percent were girls. The number of reported cases of Reye's syndrome declined sharply after the association of Reye's syndrome with aspirin was reported. After a peak of 555 cases in children reported in 1980, there have been no more than 36 cases per year since 1987. Antecedent illnesses were reported in 93 percent of the children, and detectable blood salicylate levels in 82 percent. The overall case fatality rate was 31 percent. The case fatality rate was highest in children under five years of age (relative risk, 1.8; 95 percent confidence interval, 1.5 to 2.1) and in those with a serum ammonia level above 45 microg per deciliter (26 micromol per liter) (relative risk, 3.4; 95 percent confidence interval, 1.9 to 6.2). CONCLUSIONS: Since 1980, when the association between Reye's syndrome and the use of aspirin during varicella or influenza-like illness was first reported, there has been a sharp decline in the number of infants and children reported to have Reye's syndrome. Because Reye's syndrome is now very rare, any infant or child suspected of having this disorder should undergo extensive investigation to rule out the treatable inborn metabolic disorders that can mimic Reye's syndrome. [Abstract] [Full text] [PDF]

A case-control study of HIV seroconversion in health care workers after percutaneous exposure. Centers for Disease Control and Prevention Needlestick Surveillance Group. D. M. Cardo, D. H. Culver, C. A. Ciesielski, P. U. Srivastava, R. Marcus, D. Abiteboul, J. Heptonstall, G. Ippolito, F. Lot, P. S. McKibben, D. M. Bell. N Engl J Med 1997: 337(21); 1485-90. BACKGROUND: The average risk of human immunodeficiency virus (HIV) infection after percutaneous exposure to HIV-infected blood is 0.3 percent, but the factors that influence this risk are not well understood. METHODS: We conducted a case-control study of health care workers with occupational, percutaneous exposure to HIV-infected blood. The case patients were those who became seropositive after exposure to HIV, as reported by national surveillance systems in France, Italy, the United Kingdom, and the United States. The controls were health care workers in a prospective surveillance project who were exposed to HIV but did not seroconvert. RESULTS: Logistic-regression analysis based on 33 case patients and 665 controls showed that significant risk factors for seroconversion were deep injury (odds ratio= 15; 95 percent confidence interval, 6.0 to 41), injury with a device that was visibly contaminated with the source patient's blood (odds ratio= 6.2; 95 percent confidence interval, 2.2 to 21), a procedure involving a needle placed in the source patient's artery or vein (odds ratio=4.3; 95 percent confidence interval, 1.7 to 12), and exposure to a source patient who died of the acquired immunodeficiency syndrome within two months afterward (odds ratio=5.6; 95 percent confidence interval, 2.0 to 16). The case patients were significantly less likely than the controls to have taken zidovudine after the exposure (odds ratio=0.19; 95 percent confidence interval, 0.06 to 0.52). CONCLUSIONS: The risk of HIV infection after percutaneous exposure increases with a larger volume of blood and, probably, a higher titer of HIV in the source patient's blood. Postexposure prophylaxis with zidovudine appears to be protective. [Abstract] [Full text] [PDF]

Reye's syndrome. M. Casteels-Van Daele, C. Van Geet, C. Wouters, E. Eggermont. Lancet 2001: 358(9278); 334. Abstract not available yet.

Risk of testicular cancer in subfertile men: case-control study. H. Moller, N. E. Skakkebaek. British Medical Journal 1999: 318(7183); 559-62. OBJECTIVE: To evaluate the association between subfertility in men and the subsequent risk of testicular cancer. DESIGN: Population based case-control study. SETTING: The Danish population. PARTICIPANTS: Cases were identified in the Danish Cancer Registry; controls were randomly selected from the Danish population with the computerised Danish Central Population Register. Men were interviewed by telephone; 514 men with cancer and 720 controls participated. OUTCOME MEASURE: Occurrence of testicular cancer. RESULTS: A reduced risk of testicular cancer was associated with paternity (relative risk 0.63; 95% confidence interval 0.47 to 0.85). In men who before the diagnosis of testicular cancer had a lower number of children than expected on the basis of their age, the relative risk was 1.98 (1.43 to 2.75). There was no corresponding protective effect associated with a higher number of children than expected. The associations were similar for seminoma and non-seminoma and were not influenced by adjustment for potential confounding factors. CONCLUSION: These data are consistent with the hypothesis that male subfertility and testicular cancer share important aetiological factors.

Testicular cancer risk in relation to use of disposable nappies. H. Moller. Arch Dis Child 2002: 86(1); 28-9. Information on the use of disposable nappies in childhood was available for 296 testicular cancer cases and 287 population controls in Denmark. No association was found between disposable nappy use and the subsequent risk of testicular cancer in adulthood.

The disappearance of Reye's syndrome--a public health triumph. A. S. Monto. N Engl J Med 1999: 340(18); p1423-4. Abstract not available.

Hospital controls versus community controls: differences in inferences regarding risk factors for hip fracture. D. J. Moritz, J. L. Kelsey, J. A. Grisso. Am J Epidemiol 1997: 145(7); 653-60. In case-control studies using cases identified from persons admitted to hospitals, two types of controls are most often used: persons from the communities served by the hospitals and persons admitted to the same hospitals as those to which the cases were admitted. It is often unclear which is the more appropriate choice, and whether the use of one or the other type of control group will lead to biased conclusions. The purpose of the present analysis was to determine whether the choice of hospital controls versus community controls would influence conclusions regarding risk factors for hip fracture. Cases (n = 425), hospital controls (n = 312) and community controls (n = 454) were drawn from a case-control study of risk factors for hip fracture in women. Study participants were white and black women aged 45 years or older and living in New York City or Philadelphia, Pennsylvania, who were selected between September 1987 and July 1989. Using community controls but not hospital controls, investigators would have concluded that having a fall during the previous 6 months, current smoking, and moving during the previous year were associated with an increased risk of hip fracture. Associations of hip fracture risk with stroke and prior use of ambulatory aids were stronger using community controls, but associations with estrogen use and body mass index were not influenced by choice of control group. Community controls were quite similar to representative samples of community-dwelling elderly women, whereas hospital controls were somewhat sicker and more likely to be current smokers. The authors conclude that community controls comprise the more appropriate control group in case-control studies of hip fracture in the elderly.

Case-control studies: research in reverse. K. F. Schulz, D.A. Grimes. Lancet 2002: 359431-434. Epidemiologists benefit greatly from having case-control study designs in their research armamentarium. Case-control studies can yield important scientific findings with relatively little time, money, and effort compared with other study designs. This seemingly quick road to research results entices many newly trained epidemiologists. Indeed, investigators implement case-control studies more frequently than any other analytical epidemiological study. Unfortunately, case-control designs also tend to be more susceptible to biases than other comparative studies. Although easier to do, they are also easier to do wrong. Five main notions guide investigators who do, or readers who assess, case-control studies. First, investigators must explicitly define the criteria for diagnosis of a case and any eligibility criteria used for selection. Second, controls should come from the same population as the cases, and their selection should be independent of the exposures of interest. Third, investigators should blind the data gatherers to the case or control status of participants or, if impossible, at least blind them to the main hypothesis of the study. Fourth, data gatherers need to be thoroughly trained to elicit exposure in a similar manner from cases and controls; they should use memory aids to facilitate and balance recall between cases and controls. Finally, investigators should address confounding in case-control studies, either in the design stage or with analytical techniques. Devotion of meticulous attention to these points enhances the validity of the results and bolsters the reader's confidence in the findings.

Selection of controls in case-control studies. I. Principles. S. Wacholder, J. K. McLaughlin, D. T. Silverman, J. S. Mandel. Am J Epidemiol 1992: 135(9); p1019-28. A synthesis of classical and recent thinking on the issues involved in selecting controls for case-control studies is presented in this and two companion papers (S. Wacholder et al. Am J Epidemiol 1992; 135:1029-50). In this paper, a theoretical framework for selecting controls in case-control studies is developed. Three principles of comparability are described: 1) study base, that all comparisons be made within the study base; 2) deconfounding, that comparisons of the effects of the levels of exposure on disease risk not be distorted by the effects of other factors; and 3) comparable accuracy, that any errors in measurement of exposure be nondifferential between cases and controls. These principles, if adhered to in a study, can reduce selection, confounding, and information bias, respectively. The principles, however, are constrained by an additional efficiency principle regarding resources and time. Most problems and controversies in control selection reflect trade-offs among these four principles.

Selection of controls in case-control studies. II. Types of controls. S. Wacholder, D. T. Silverman, J. K. McLaughlin, J. S. Mandel. Am J Epidemiol 1992: 135(9); p1029-41. Types of control groups are evaluated using the principles described in paper 1 of the series, "Selection of Controls in Case-Control Studies" (S. Wacholder et al. Am J Epidemiol 1992; 135:1019-28). Advantages and disadvantages of population controls, neighborhood controls, hospital or registry controls, medical practice controls, friend controls, and relative controls are considered. Problems with the use of decreased controls and proxy respondents are discussed.

Selection of controls in case-control studies. III. Design options. S. Wacholder, D. T. Silverman, J. K. McLaughlin, J. S. Mandel. Am J Epidemiol 1992: 135(9); p1042-50. Several design options available in the planning stage of case-control studies are examined. Topics covered include matching, control/case ratio, choice of nested case-control or case-cohort design, two-stage sampling, and other methods that can be used for control selection. The effect of potential problems in obtaining comparable accuracy of exposure is also examined. A discussion of the difficulty in meeting the principles of study base, deconfounding, and comparable accuracy (S. Wacholder et al. Am J Epidemiol 1992; 135:1019-28) in a single study completes this series of papers.

Design issues in case-control studies. S. Wacholder. Stat Methods Med Res 1995: 4(4); p293-309. The most difficult and most important considerations in planning the protocol of a case-control study are ascertainment of cases, selection of controls and the quality of the exposure measurement. Plans to ensure careful field work are equally important; without attention to data collection, the protocol will be meaningless. In most case-control studies, the measurement problem is magnified because one cannot implement the collection of exposure information at the beginning of follow-up, and instead must rely on interviews, existing records or extrapolation into the past. Consideration of a case-control study as an efficient way to study a cohort helps to resolve some design issues.

Are risk factors for sudden infant death syndrome different at night? S. M. Williams, E. A. Mitchell, B. J. Taylor. Arch Dis Child 2002: 87(4); 274-8. AIMS: To determine whether the risk factors for SIDS occurring at night were different from those occurring during the day. METHODS: Large, nationwide case-control study, with data for 369 cases and 1558 controls in New Zealand. RESULTS: Two thirds of SIDS deaths occurred at night (between 10 pm and 7 30 am). The odds ratio (95% CI) for prone sleep position was 3.86 (2.67 to 5.59) for deaths occurring at night and 7.25 (4.52 to 11.63) for deaths occurring during the day; the difference was significant. The odds ratio for maternal smoking for deaths occurring at night was 2.28 (1.52 to 3.42) and that for the day 1.27 (0.79 to 2.03); that for the mother being single was 2.69 (1.29 to 3.99) for a night time death and 1.25 (0.76 to 2.04) for a daytime death. Both interactions were significant. The interactions between time of death and bed sharing, not sleeping in a cot or bassinet, Maori ethnicity, late timing of antenatal care, binge drinking, cannabis use, and illness in the baby were also significant, or almost so. All were more strongly associated with SIDS occurring at night. CONCLUSIONS: Prone sleep position was more strongly associated with SIDS occurring during the day, whereas night time deaths were more strongly associated with maternal smoking and measures of social deprivation.

Case report

Unconventional cancer therapies: What we need is rigorous research, not closed minds. E. Ernst. Chest 2000: 117(2); 307-8. [Full text] [PDF]

Cluster randomization

Extending the CONSORT statement to cluster randomized trials: for discussion. D. R. Elbourne, M. K. Campbell. Stat Med 2001: 20(3); 489-96. The need for clear reporting of randomized controlled trials has been emphasized recently. The CONSORT Statement has made evidence-based suggestions for a checklist and a patient flow diagram. Adapting this for cluster randomized controlled trials presents particular challenges. Simple changes in the checklist and diagram for the completely randomized two level cluster randomized trials are suggested for discussion. An example taken from an unpublished trial demonstrates that these changes are less simple to implement, although extensions to electronic publications may be helpful. These suggestions should be formally evaluated. Further work is required to consider the cases of more levels and of stratified or pair-matched cluster randomized trials.

Cohort design

Cigarette smoking and diabetes mellitus: evidence of a positive association from a large prospective cohort study. J. C. Will, D. A. Galuska, E. S. Ford, A. Mokdad, E. E. Calle. Int J Epidemiol 2001: 30(3); p540-6. OBJECTIVE: Only a few prospective studies have examined the relationship between the frequency of cigarette smoking and the incidence of diabetes mellitus. The purpose of this study was to determine whether greater frequency of cigarette smoking accelerated the development of diabetes mellitus, and whether quitting reversed the effect. METHODS: Data were collected in the Cancer Prevention Study I, a prospective cohort study conducted from 1959 through 1972 by the American Cancer Society where volunteers recruited more than one million acquaintances in 25 US states. From these over one million original participants, 275,190 men and 434,637 women aged > or = 30 years were selected for the primary analysis using predetermined criteria. RESULTS: As smoking increased, the rate of diabetes increased for both men and women. Among those who smoked > or = 2 packs per day at baseline, men had a 45% higher diabetes rate than men who had never smoked; the comparable increase for women was 74%. Quitting smoking reduced the rate of diabetes to that of non-smokers after 5 years in women and after 10 years in men. CONCLUSIONS: A dose-response relationship seems likely between smoking and incidence of diabetes. Smokers who quit may derive substantial benefit from doing so. Confirmation of these observations is needed through additional epidemiological and biological research.

Concealed allocation

Bias in treatment assignment in controlled clinical trails. TC Chalmers, P Celano, HS Sacks, H Jr Smith. N Engl J Med 1983: 309(22); 1358-61. ABSTRACT: Controlled clinical trials of the treatment of acute myocardial infarction offer a unique opportunity for the study of the potential influence on outcome of bias in treatment assignment. A group of 145 papers was divided into those in which the randomization process was blinded (57 papers), those in which it may have been unblinded (45 papers), and those in which the controls were selected by a nonrandom process (43 papers). At least one prognostic variable was maldistributed (P less than 0.05) in 14.0 per cent of the blinded-randomization studies, in 26.7 per cent of the unblinded-randomization studies, and in 58.1 per cent of the nonrandomized studies. Differences in case-fatality rates between treatment and control groups (P less than 0.05) were found in 8.8 per cent of the blinded-randomization studies, 24.4 per cent of the unblinded-randomization studies, and 58.1 per cent of the nonrandomized studies. These data emphasize the importance of keeping those who recruit patients for clinical trials from suspecting which treatment will be assigned to the patient under consideration.

Randomised trials, human nature, and reporting guidelines. K. F. Schulz. Lancet 1996: 348(9027); 596-8. Abstract not available.

Empirical evidence of bias dimensions of methodological quality associated with estimates of treatment effects in controlled trials. KF Schulz, I Chalmers, RJ Hayes, DG Altman. JAMA 1995: 273(5); 408-12. ABSTRACT: OBJECTIVE--To determine if inadequate approaches to randomized controlled trial design and execution are associated with evidence of bias in estimating treatment effects. DESIGN--An observational study in which we assessed the methodological quality of 250 controlled trials from 33 meta-analyses and then analyzed, using multiple logistic regression models, the associations between those assessments and estimated treatment effects. DATA SOURCES--Meta-analyses from the Cochrane Pregnancy and Childbirth Database. MAIN OUTCOME MEASURES--The associations between estimates of treatment effects and inadequate allocation concealment, exclusions after randomization, and lack of double-blinding. RESULTS--Compared with trials in which authors reported adequately concealed treatment allocation, trials in which concealment was either inadequate or unclear (did not report or incompletely reported a concealment approach) yielded larger estimates of treatment effects (P < .001). Odds ratios were exaggerated by 41% for inadequately concealed trials and by 30% for unclearly concealed trials (adjusted for other aspects of quality). Trials in which participants had been excluded after randomization did not yield larger estimates of effects, but that lack of association may be due to incomplete reporting. Trials that were not double-blind also yielded larger estimates of effects (P = .01), with odds ratios being exaggerated by 17%. CONCLUSIONS--This study provides empirical evidence that inadequate methodological approaches in controlled trials, particularly those representing poor allocation concealment, are associated with bias. Readers of trial reports should be wary of these pitfalls, and investigators must improve their design, execution, and reporting of trials.

Allocation concealment in randomised trials: defending against deciphering. K. F. Schulz, D.A. Grimes. Lancet 2002: 359614-618. Proper randomisation rests on adequate allocation concealment. An allocation concealment process keeps clinicians and participants unaware of upcoming assignments. Without it, even properly developed random allocation sequences can be subverted. Within this concealment process, the crucial unbiased nature of randomised controlled trials collides with their most vexing implementation problems. Proper allocation concealment frequently frustrates clinical inclinations, which annoys those who do the trials. Randomised controlled trials are anathema to clinicians. Many involved with trials will be tempted to decipher assignments, which subverts randomisation. For some implementing a trial, deciphering the allocation scheme might frequently become too great an intellectual challenge to resist. Whether their motives indicate innocent or pernicious intents, such tampering undermines the validity of a trial. Indeed, inadequate allocation concealment leads to exaggerated estimates of treatment effect, on average, but with scope for bias in either direction. Trial investigators will be crafty in any potential efforts to decipher the allocation sequence, so trial designers must be just as clever in their design efforts to prevent deciphering. Investigators must effectively immunise trials against selection and confounding biases with proper allocation concealment. Furthermore, investigators should report baseline comparisons on important prognostic variables. Hypothesis tests of baseline characteristics, however, are superfluous and could be harmful if they lead investigators to suppress reporting any baseline imbalances.

Generation of allocation sequences in randomised trials: chance not choice. K. F. Schulz, D.A. Grimes. Lancet 2002: 359515-519. The randomised controlled trial sets the gold standard of clinical research. However, randomisation persists as perhaps the least-understood aspect of a trial. Moreover, anything short of proper randomisation courts selection and confounding biases. Researchers should spurn all systematic, non-random methods of allocation. Trial participants should be assigned to comparison groups based on a random process. Simple (unrestricted) randomisation, analogous to repeated fair coin-tossing, is the most basic of sequence generation approaches. Furthermore, no other approach, irrespective of its complexity and sophistication, surpasses simple randomisation for prevention of bias. Investigators should, therefore, use this method more often than they do, and readers should expect and accept disparities in group sizes. Several other complicated restricted randomisation procedures limit the likelihood of undesirable sample size imbalances in the intervention groups. The most frequently used restricted sequence generation procedure is blocked randomisation. If this method is used, investigators should randomly vary the block sizes and use larger block sizes, particularly in an unblinded trial. Other restricted procedures, such as urn randomisation, combine beneficial attributes of simple and restricted randomisation by preserving most of the unpredictability while achieving some balance. The effectiveness of stratified randomisation depends on use of a restricted randomisation approach to balance the allocation sequences for each stratum. Generation of a proper randomisation sequence takes little time and effort but affords big rewards in scientific accuracy and credibility. Investigators should devote appropriate resources to the generation of properly randomised trials and reporting their methods clearly.

Ecologic design

Modeling treatment effects on binary outcomes with grouped-treatment variables and individual covariates. S. C. Johnston, T. Henneman, C. E. McCulloch, M. van der Laan. Am J Epidemiol 2002: 156(8); 753-60. During evaluation of treatment effects in observational studies, confounding is a constant threat because it is always possible that patients with a better prognosis, not adequately characterized by measured covariates, are chosen for a specific therapy. Ecologic analyses may avoid confounding that would be present in analysis at the individual level because variations in regional or hospital practice may be unrelated to prognosis. The authors used simulated data with an excluded confounder to evaluate the reliability and limitations of the grouped-treatment approach, a method of incorporating an ecologic measure of treatment assignment into an individual-level multivariable model, similar to the instrumental variable approach. Estimates based on the grouped-treatment approach were closer to the true value than those of standard individual-level multivariable analysis in every simulation. Furthermore, confidence intervals based on the grouped-treatment approach achieved approximately their nominal coverage, whereas those based on individual-level analyses did not. The grouped-treatment approach appears to be more reliable than standard individual-level analysis in situations where the grouped-treatment variable is unassociated with the outcome except via the actual treatment assignment and measured covariates.

The Semi-individual Study in Air Pollution Epidemiology: A Valid Design as Compared to Ecologic Studies. Nino Kunzli, Ira B. Tager. Environmental Health Perspectives 1997: 105(10); 1078-1083. ABSTRACT: The assessment of long-term effects of air pollution in humans relies on epidemiologic studies. A widely used design consists of cross-sectional or cohort studies in which ecologic assignment of exposure, based on a fixed-site ambient monitor, is employed. Although health outcome and usually a large number of covariates are measured in individuals, these studies are often called ecological. We will introduce the term semi-individual design for these studies. We review the major properties and limitations with regard to causal inference of truly ecologic studies, in which outcome, exposure, and covariates are available on an aggregate level only. Misclassification problems and issues related to confounding and model specification in truly ecologic studies limit etiologic inference to individuals. In contrast, the semi-individual study shares its methodological and inferential properties with typical individual-level study designs. The major caveat relates to the case where too few study areas, e.g., two or three, are used, which render control of aggregate level confounding impossible. The issue of exposure misclassification is of general concern in epidemiology and not an exclusive problem of the semi-individual design. In a multicenter setting, the semi-individual study is a valuable tool to approach long-term effects of air pollution. Knowledge about the error structure of the ecologically assigned exposure allows consideration of the impact of ecologically assigned exposure on effect estimation. Semi-individual studies, i.e., individual level air pollution studies with ecologic exposure assignment, more readily permit valid inference to individuals and should not be labeled as ecologic studies.

Ecologic studies in epidemiology: concepts, principles, and methods. H. Morgenstern. Annu Rev Public Health 1995: 1661-81. An ecologic study focuses on the comparison of groups, rather than individuals; thus, individual-level data are missing on the joint distribution of variables within groups. Variables in an ecologic analysis may be aggregate measures, environmental measures, or global measures. The purpose of an ecologic analysis may be to make biologic inferences about effects on individual risks or to make ecologic inferences about effects on group rates. Ecologic study designs may be classified on two dimensions: (a) whether the primary group is measured (exploratory vs analytic study); and (b) whether subjects are grouped by place (multiple-group study), by time (time-trend study), or by place and time (mixed study). Despite several practical advantages of ecologic studies, there are many methodologic problems that severely limit causal inference, including ecologic and cross-level bias, problems of confounder control, within-group misclassification, lack of adequate data, temporal ambiguity, collinearity, and migration across groups.

Medicine and the Media: Did Monica really say that? Hugh Tunstall-Pedoe. British Medical Journal 1998: 3171023. Abstract not available yet. [Full text]

Ecological study for reasons for sharp decline in mortality from ischaemic heart disease in Poland since 1991. WA Zatonski, AJ McMichael, JW Powles. British Medical Journal 1998: 316(7137); 1047-1051. ABSTRACT: OBJECTIVE: To investigate the reasons for the decline in deaths attributed to ischaemic heart disease in Poland since 1991 after two decades of rising rates. DESIGN: Recent changes in mortality were measured as percentage deviations in 1994 from rates predicted by extrapolation of sex and age specific death rates for 1980-91 for diseases of the circulatory system and selected other categories. Available data on national and household food availability, alcohol consumption, cigarette smoking, socioeconomic indices, and medical services over time were reviewed. MAIN OUTCOME MEASURES: Age specific and age standardised rates of death attributed to ischaemic heart disease and related causes. RESULTS: The change in trend in mortality attributed to diseases of the circulatory system was similar in men and women and most marked (> 20%) in early middle age. For ages 45 to 64 the decrease was greatest for deaths attributed to ischaemic heart disease and atherosclerosis (around 25%) and less for stroke (< 10%). For most of the potentially explanatory variables considered, there were no corresponding changes in trend. However, between 1986-90 and 1994 there was a marked switch from animal fats (estimated availability down 23%) to vegetable fats (up 48%) and increased imports of fruit. CONCLUSION: Reporting biases are unlikely to have exaggerated the true fall in ischaemic heart disease; neither is it likely to be mainly due to changes in smoking, drinking, stress, or medical care. Changes in type of dietary fat and increased supplies of fresh fruit and vegetables seem to be the best candidates. [Medline] [Abstract] [PDF]

Examples

Influence of maternal age at delivery and birth order on risk of type 1 diabetes in childhood: prospective population based family study. Bart's-Oxford Family Study Group. P. J. Bingley, I. F. Douek, C. A. Rogers, E. A. Gale. British Medical Journal 2000: 321(7258); 420-4. OBJECTIVES: To examine the influence of parental age at delivery and birth order on subsequent risk of childhood diabetes. DESIGN: Prospective population based family study. SETTING: Area formerly administered by the Oxford Regional Health Authority. Participants: 1375 families in which one child or more had diabetes. Of 3221 offspring, 1431 had diabetes (median age at diagnosis 10.5 years, range 0.4-28.5) and 1790 remained non-diabetic at a median age of 16. 1 years. MAIN OUTCOME MEASURES: Disease free survival and hazard ratios for the development of type 1 diabetes in all offspring, assessed by Cox proportional hazard regression. Results: Maternal age at delivery was strongly related to risk of type 1 diabetes in the offspring; risk increased by 25% (95% confidence interval 17% to 34%) for each five year band of maternal age, so that maternal age at delivery of 45 years or more was associated with a relative risk of 3.11 (2.07 to 4.66) compared with a maternal age of less than 20 years. Paternal age was also associated with a 9% (3% to 16%) increase for each five year increase in paternal age. The relative risk of diabetes, adjusted for parental age at delivery and sex of offspring, decreased with increasing birth order; the overall effect was a 15% risk reduction (10% to 21%) per child born. CONCLUSIONS: A strong association was found between increasing maternal age at delivery and risk of diabetes in the child. Risk was highest in firstborn children and decreased progressively with higher birth order. The fetal environment seems to have a strong influence on risk of type 1 diabetes in the child. The increase in maternal age at delivery in the United Kingdom over the past two decades could partly account for the increase in incidence of childhood diabetes over this period. [Medline] [Abstract] [Full text] [PDF]

Statistical Inquiries into the Efficacy of Prayer. Sir Francis Galton. Fortnightly Review 1872: 12125-135. (This article was originally published in 1872 and is reproduced by the Pictures of Health Web Site.) An eminent authority has recently published a challenge to test the efficacy of prayer by actual experiment. I have been induced, through reading this, to prepare the following memoir for publication, nearly the whole of which I wrote and laid by many years ago, after completing a large collection of data, which I had undertaken for the satisfaction of my own conscience. [Full text] [PDF]

Lack of effect of long-term supplementation with beta carotene on the incidence of malignant neoplasms and cardiovascular disease. C. H. Hennekens, J. E. Buring, J. E. Manson, M. Stampfer, B. Rosner, N. R. Cook, C. Belanger, F. La Motte, J. M. Gaziano, P. M. Ridker, W. Willett, R. Peto. N Engl J Med 1996: 334(18); 1145-9. BACKGROUND. Observational studies suggest that people who consume more fruits and vegetables containing beta carotene have somewhat lower risks of cancer and cardiovascular disease, and earlier basic research suggested plausible mechanisms. Because large randomized trials of long duration were necessary to test this hypothesis directly, we conducted a trial of beta carotene supplementation. METHODS. In a randomized, double-blind, placebo-controlled trial of beta carotene (50 mg on alternate days), we enrolled 22,071 male physicians, 40 to 84 years of age, in the United States; 11 percent were current smokers and 39 percent were former smokers at the beginning of the study in 1982. By December 31, 1995, the scheduled end of the study, fewer than 1 percent had been lost to follow-up, and compliance was 78 percent in the group that received beta carotene. RESULTS. Among 11,036 physicians randomly assigned to receive beta carotene and 11,035 assigned to receive placebo, there were virtually no early or late differences in the overall incidence of malignant neoplasms or cardiovascular disease, or in overall mortality. In the beta carotene group, 1273 men had any malignant neoplasm (except nonmelanoma skin cancer), as compared with 1293 in the placebo group (relative risk, 0.98; 95 percent confidence interval, 0.91 to 1.06). There were also no significant differences in the number of cases of lung cancer (82 in the beta carotene group vs. 88 in the placebo group); the number of deaths from cancer (386 vs. 380), deaths from any cause (979 vs. 968), or deaths from cardiovascular disease (338 vs. 313); the number of men with myocardial infarction (468 vs. 489); the number with stroke (367 vs. 382); or the number with any one of the previous three end points (967 vs. 972). Among current and former smokers, there were also no significant early or late differences in any of these end points. CONCLUSIONS. In this trial among healthy men, 12 years of supplementation with beta carotene produced neither benefit nor harm in terms of the incidence of malignant neoplasms, cardiovascular disease, or death from all causes.

Dietary fat intake and the risk of coronary heart disease in women. F. B. Hu, M. J. Stampfer, J. E. Manson, E. Rimm, G. A. Colditz, B. A. Rosner, C. H. Hennekens, W. C. Willett. N Engl J Med 1997: 337(21); 1491-9. BACKGROUND: The relation between dietary intake of specific types of fat, particularly trans unsaturated fat and the risk of coronary disease remains unclear. We therefore studied this relation in women enrolled in the Nurses' Health Study. METHODS: We prospectively studied 80,082 women who were 34 to 59 years of age and had no known coronary disease, stroke, cancer, hypercholesterolemia, or diabetes in 1980. Information on diet was obtained at base line and updated during follow-up by means of validated questionnaires. During 14 years of follow-up, we documented 939 cases of nonfatal myocardial infarction or death from coronary heart disease. Mutivariate analyses included age, smoking status, total energy intake, dietary cholesterol intake, percentages of energy obtained from protein and specific types of fat, and other risk factors. RESULTS: Each increase of 5 percent of energy intake from saturated fat, as compared with equivalent energy intake from carbohydrates, was associated with a 17 percent increase in the risk of coronary disease (relative risk, 1.17; 95 percent confidence interval, 0.97 to 1.41; P=0.10). As compared with equivalent energy from carbohydrates, the relative risk for a 2 percent increment in energy intake from trans unsaturated fat was 1.93 (95 percent confidence interval, 1.43 to 2.61; P<0.001); that for a 5 percent increment in energy from monounsaturated fat was 0.81 (95 percent confidence interval, 0.65 to 1.00; P=0.05); and that for a 5 percent increment in energy from polyunsaturated fat was 0.62 (95 percent confidence interval, 0.46 to 0.85; P= 0.003). Total fat intake was not signficantly related to the risk of coronary disease (for a 5 percent increase in energy from fat, the relative risk was 1.02; 95 percent confidence interval, 0.97 to 1.07; P=0.55). We estimated that the replacement of 5 percent of energy from saturated fat with energy from unsaturated fats would reduce risk by 42 percent (95 percent confidence interval, 23 to 56; P<0.001) and that the replacement of 2 percent of energy from trans fat with energy from unhydrogenated, unsaturated fats would reduce risk by 53 percent (95 percent confidence interval, 34 to 67; P<.001). CONCLUSIONS: Our findings suggest that replacing saturated and trans unsaturated fats with unhydrogenated monounsaturated and polyunsaturated fats is more effective in preventing coronary heart disease in women than reducing overall fat intake.

Breastfeeding and infant growth: biology or bias? M. S. Kramer, T. Guo, R. W. Platt, S. Shapiro, J. P. Collet, B. Chalmers, E. Hodnett, Z. Sevkovskaya, I. Dzikovich, I. Vanilovich. Pediatrics 2002: 110(2 Pt 1); 343-7. BACKGROUND: Available evidence suggests that prolonged and exclusive breastfeeding is associated with lower infant weight and length by 6 to 12 months of age. This evidence, however, is based on observational studies, which are unable to separate the effects of feeding mode per se from selection bias, reverse causality, and the confounding effects of maternal attitudinal factors. DESIGN/METHODS: A cluster-randomized trial in the Republic of Belarus of a breastfeeding promotion intervention modeled on the World Health Organization (WHO)/UNICEF Baby-Friendly Hospital Initiative versus control (then current) infant feeding practices. Healthy, full-term, singleton breastfed infants (n = 17 046) weighing > or =2500 g were enrolled soon after birth and followed up at 1, 2, 3, 6, 9, and 12 months old for measurements of weight, length, and head circumference. Data were analyzed according to intention-to-treat, while accounting for within-cluster correlation. To assess the potential for bias in observational studies of breastfeeding, we also analyzed our data as if we had conducted an observational study by ignoring treatment, combining the 2 randomized groups, and comparing 1378 infants weaned in the first month and those breastfed for the full 12 months of follow-up with either > or =3 months (n = 1271) or > or =6 months (n = 251) of exclusive breastfeeding. RESULTS: Infants from the experimental sites were significantly more likely to be breastfed (to any degree) at 3, 6, 9, and 12 months and were far more likely to be exclusively breastfed at 3 months (43.3% vs 6.4%). Mean birth weight was nearly identical in the 2 groups (3448 g, experimental; 3446 g, control). Mean weight was significantly higher in the experimental group by 1 month of age (4341 vs 4280 g). The difference increased through 3 months (6153 g vs 6047 g), declined slowly thereafter, and disappeared by 12 months (10564 g vs 10571 g). Analysis by z scores confirmed that infants in both groups gained more weight than the WHO/Centers for Disease Control and Prevention reference, with no evidence of undernutrition in the control group. Length followed a similar pattern. In the observational analyses, infants weaned in the first month were slightly lighter and shorter at birth and their weight-for-age and length-for-age z scores declined by 1 month, but they caught up to both experimental and the other observational groups by 6 months and were heavier and longer by 12 months. Among infants in the 2 prolonged and exclusive breastfeeding groups, weight-for-age z scores fell slightly between 3 and 12 months; length-for-age fell below the reference by 6 months with catch-up to the reference by 12 months. Head circumference showed no significant differences at any age between the 2 trial groups or among the observational groups. CONCLUSIONS: Our data, the first in humans based on a randomized experiment, suggest that prolonged and exclusive breastfeeding may actually accelerate weight and length gain in the first few months, with no detectable deficit by 12 months old. These results add support to current WHO and UNICEF feeding recommendations. Our observational analysis showing faster weight and length gains with early weaning and slower gains with prolonged and exclusive breastfeeding may reflect unmeasured confounding differences or a true biological effect of formula feeding.

Effects of a Combination of Beta Carotene and Vitamin A on Lung Cancer and Cardiovascular Disease. GS Omenn, GE Goodman, MD Thornquist, J Balmes, MR Cullen, A Glass, JP Keogh, FL Meyskens, B Valanis, JH Williams, S Barnhart, S Hammar. The New England Journal of Medicine 1992: 334(18); 1150-1155. ABSTRACT: BACKGROUND. Lung cancer and cardiovascular disease are major causes of death in the United States. It has been proposed that carotenoids and retinoids are agents that may prevent these disorders. METHODS. We conducted a multicenter, randomized, double-blind, placebo-controlled primary prevention trial -- the Beta Carotene and Retinol Efficacy Trial -- involving a total of 18,314 smokers, former smokers, and workers exposed to asbestos. The effects of a combination of 30 mg of beta carotene per day and 25,000 IU of retinol (vitamin A) in the form of retinyl palmitate per day on the primary end point, the incidence of lung cancer, were compared with those of placebo. RESULTS. A total of 388 new cases of lung cancer were diagnosed during the 73,135 person-years of follow-up (mean length of follow-up, 4.0 years). The active-treatment group had a relative risk of lung cancer of 1.28 (95 percent confidence interval, 1.04 to 1.57; P=0.02), as compared with the placebo group. There were no statistically significant differences in the risks of other types of cancer. In the active-treatment group, the relative risk of death from any cause was 1.17 (95 percent confidence interval, 1.03 to 1.33); of death from lung cancer, 1.46 (95 percent confidence interval, 1.07 to 2.00); and of death from cardiovascular disease, 1.26 (95 percent confidence interval, 0.99 to 1.61). On the basis of these findings, the randomized trial was stopped 21 months earlier than planned; follow-up will continue for another 5 years. CONCLUSIONS. After an average of four years of supplementation, the combination of beta carotene and vitamin A had no benefit and may have had an adverse effect on the incidence of lung cancer and on the risk of death from lung cancer, cardiovascular disease, and any cause in smokers and workers exposed to asbestos.

Risk factors for lung cancer and for intervention effects in CARET, the Beta-Carotene and Retinol Efficacy Trial. G. S. Omenn, G. E. Goodman, M. D. Thornquist, J. Balmes, M. R. Cullen, A. Glass, J. P. Keogh, F. L. Meyskens, Jr., B. Valanis, J. H. Williams, Jr., S. Barnhart, M. G. Cherniack, C. A. Brodkin, S. Hammar. Journal of the National Cancer Institute 1996: 88(21); 1550-9. BACKGROUND: Evidence has accumulated from observational studies that people eating more fruits and vegetables, which are rich in beta-carotene (a violet to yellow plant pigment that acts as an antioxidant and can be converted to vitamin A by enzymes in the intestinal wall and liver) and retinol (an alcohol chemical form of vitamin A), and people having higher serum beta-carotene concentrations had lower rates of lung cancer. The Beta-Carotene and Retinol Efficacy Trial (CARET) tested the combination of 30 mg beta-carotene and 25,000 IU retinyl palmitate (vitamin A) taken daily against placebo in 18314 men and women at high risk of developing lung cancer. The CARET intervention was stopped 21 months early because of clear evidence of no benefit and substantial evidence of possible harm; there were 28% more lung cancers and 17% more deaths in the active intervention group (active = the daily combination of 30 mg beta-carotene and 25,000 IU retinyl palmitate). Promptly after the January 18, 1996, announcement that the CARET active intervention had been stopped, we published preliminary findings from CARET regarding cancer, heart disease, and total mortality. PURPOSE: We present for the first time results based on the pre-specified analytic method, details about risk factors for lung cancer, and analyses of subgroups and of factors that possibly influence response to the intervention. METHODS: CARET was a randomized, double-blinded, placebo-controlled chemoprevention trial, initiated with a pilot phase and then expanded 10-fold at six study centers. Cigarette smoking history and status and alcohol intake were assessed through participant self-report. Serum was collected from the participants at base line and periodically after randomization and was analyzed for beta-carotene concentration. An Endpoints Review Committee evaluated endpoint reports, including pathologic review of tissue specimens. The primary analysis is a stratified logrank test for intervention arm differences in lung cancer incidence, with weighting linearly to hypothesized full effect at 24 months after randomization. Relative risks (RRs) were estimated by use of Cox regression models; tests were performed for quantitative and qualitative interactions between the intervention and smoking status or alcohol intake. O'Brien-Fleming boundaries were used for stopping criteria at interim analyses. Statistical significance was set at the .05 alpha value, and all P values were derived from two-sided statistical tests. RESULTS: According to CARET's pre-specified analysis, there was an RR of 1.36 (95% confidence interval [CI] = 1.07-1.73; P = .01) for weighted lung cancer incidence for the active intervention group compared with the placebo group, and RR = 1.59 (95% CI = 1.13-2.23; P = .01) for weighted lung cancer mortality. All subgroups, except former smokers, had a point estimate of RR of 1.10 or greater for lung cancer. There are suggestions of associations of the excess lung cancer incidence with the highest quartile of alcohol intake (RR = 1.99; 95% CI = 1.28-3.09; test for heterogeneity of RR among quartiles of alcohol intake has P = .01, unadjusted for multiple comparisons) and with large-cell histology (RR = 1.89; 95% CI = 1.09-3.26; test for heterogeneity among histologic categories has P = .35), but not with base-line serum beta-carotene concentrations. CONCLUSIONS: CARET participants receiving the combination of beta-carotene and vitamin A had no chemopreventive benefit and had excess lung cancer incidence and mortality. The results are highly consistent with those found for beta-carotene in the Alpha-Tocopherol Beta-Carotene Cancer Prevention Study in 29133 male smokers in Finland.

Observational Studies. PR Rosenbaum (1995) New York: Springer-Verlag.

Ginkgo for memory enhancement: a randomized controlled trial. P. R. Solomon, F. Adams, A. Silver, J. Zimmer, R. DeVeaux. Jama 2002: 288(7); 835-40. CONTEXT: Several over-the-counter treatments are marketed as having the ability to improve memory, attention, and related cognitive functions in as little as 4 weeks. These claims, however, are generally not supported by well-controlled clinical studies. OBJECTIVE: To evaluate whether ginkgo, an over-the-counter agent marketed as enhancing memory, improves memory in elderly adults as measured by objective neuropsychological tests and subjective ratings. DESIGN: Six-week randomized, double-blind, placebo-controlled, parallel-group trial. SETTING AND PARTICIPANTS: Community-dwelling volunteer men (n = 98) and women (n = 132) older than 60 years with Mini-Mental State Examination scores greater than 26 and in generally good health were recruited by a US academic center via newspaper advertisements and enrolled over a 26-month period from July 1996 to September 1998. INTERVENTION: Participants were randomly assigned to receive ginkgo, 40 mg 3 times per day (n = 115), or matching placebo (n = 115). MAIN OUTCOME MEASURES: Standardized neuropsychological tests of verbal and nonverbal learning and memory, attention and concentration, naming and expressive language, participant self-report on a memory questionnaire, and caregiver clinical global impression of change as completed by a companion. RESULTS: Two hundred three participants (88%) completed the protocol. Analysis of the modified intent-to-treat population (all 219 participants returning for evaluation) indicated that there were no significant differences between treatment groups on any outcome measure. Analysis of the fully evaluable population (the 203 who complied with treatment and returned for evaluation) also indicated no significant differences for any outcome measure. CONCLUSIONS: The results of this 6-week study indicate that ginkgo did not facilitate performance on standard neuropsychological tests of learning, memory, attention, and concentration or naming and verbal fluency in elderly adults without cognitive impairment. The ginkgo group also did not differ from the control group in terms of self-reported memory function or global rating by spouses, friends, and relatives. These data suggest that when taken following the manufacturer's instructions, ginkgo provides no measurable benefit in memory or related cognitive function to adults with healthy cognitive function.

The effect of vitamin E and beta carotene on the incidence of lung cancer and other cancers in male smokers. Beta Carotene Cancer Prevention Study Group The Alpha-Tocopherol. NEJM 1994: 330(15); 1029-35. ABSTRACT: BACKGROUND. Epidemiologic evidence indicates that diets high in carotenoid-rich fruits and vegetables, as well as high serum levels of vitamin E (alpha-tocopherol) and beta carotene, are associated with a reduced risk of lung cancer. METHODS. We performed a randomized, double-blind, placebo-controlled primary-prevention trial to determine whether daily supplementation with alpha-tocopherol, beta carotene, or both would reduce the incidence of lung cancer and other cancers. A total of 29,133 male smokers 50 to 69 years of age from southwestern Finland were randomly assigned to one of four regimens: alpha-tocopherol (50 mg per day) alone, beta carotene (20 mg per day) alone, both alpha-tocopherol and beta carotene, or placebo. Follow-up continued for five to eight years. RESULTS. Among the 876 new cases of lung cancer diagnosed during the trial, no reduction in incidence was observed among the men who received alpha-tocopherol (change in incidence as compared with those who did not, -2 percent; 95 percent confidence interval, -14 to 12 percent). Unexpectedly, we observed a higher incidence of lung cancer among the men who received beta carotene than among those who did not (change in incidence, 18 percent; 95 percent confidence interval, 3 to 36 percent). We found no evidence of an interaction between alpha-tocopherol and beta carotene with respect to the incidence of lung cancer. Fewer cases of prostate cancer were diagnosed among those who received alpha-tocopherol than among those who did not. Beta carotene had little or no effect on the incidence of cancer other than lung cancer. Alpha-tocopherol had no apparent effect on total mortality, although more deaths from hemorrhagic stroke were observed among the men who received this supplement than among those who did not. Total mortality was 8 percent higher (95 percent confidence interval, 1 to 16 percent) among the participants who received beta carotene than among those who did not, primarily because there were more deaths from lung cancer and ischemic heart disease. CONCLUSIONS. We found no reduction in the incidence of lung cancer among male smokers after five to eight years of dietary supplementation with alpha-tocopherol or beta carotene. In fact, this trial raises the possibility that these supplements may actually have harmful as well as beneficial effects.

Comparison of maternal and infant outcomes between vacuum extraction and forceps deliveries. S. W. Wen, S. Liu, M. S. Kramer, S. Marcoux, A. Ohlsson, R. Sauve, R. Liston. Am J Epidemiol 2001: 153(2); 103-7. The authors conducted a population-based historical cohort study in the Canadian province of Quebec to assess the maternal and infant outcomes associated with vacuum extraction and forceps deliveries. The study database contains information on 305,391 mother-infant dyads (linked by a common institutional code and hospital chart number) for singleton live vaginal births with a nonbreech presentation at the gestational age of 37 or more completed weeks and a birth weight between 2,500 and 4,000 g during fiscal years 1991/1992 to 1995/1996. Of the births, 31,015 were delivered by vacuum extraction, and 18,727 were delivered by forceps. Compared with delivery by forceps, the adjusted risk ratios for third-/fourth-degree perineal laceration, intracranial hemorrhage, subdural or cerebral hemorrhage, intraventricular hemorrhage, subarachnoid hemorrhage, cephalhematoma, and neonatal in-hospital death were 0.48 (95% confidence interval: 0.45, 0.50), 1.28 (95% confidence interval: 0.73, 2.25), 0.97 (95% confidence interval: 0.49, 1.93), 0.99 (95% confidence interval: 0.16, 5.97), 5.44 (confidence interval: 1.26, 23.43), 2.02 (95% confidence interval: 1.89, 2.16), and 0.93 (95% confidence interval: 0.32, 2.70), respectively. The authors conclude that vacuum extraction causes less maternal trauma but may increase the risk of cephalhematoma and certain types of intracranial hemorrhage (e.g., subarachnoid hemorrhage).

Historical controls

Randomized versus historical controls for clinical trials. H. Sacks, T. C. Chalmers, H. Smith, Jr. Am J Med 1982: 72(2); 233-40. To compare the use of randomized controls (RCTs) and historical controls (HCTs) for clinical trials, we searched the literature for therapies studied by both methods. We found six therapies for which 50 RCTs and 56 HCTs were reported. Forty-four of 56 HCTs (79 percent) found the therapy better than the control regimen, but only 10 of 50 RCTs (20 percent) agreed. For each therapy, the treated patients in RCTs and HCTs of the same therapy was largely due to differences in outcome for the control groups, with HCT control patients generally doing worse than the RCT control groups. Adjustment of the outcomes of the HCTs for prognostic factors, when possible, did not appreciably change the results. The data suggest that biases in patient selection may irretrievably weight the outcome of HCts in favor of new therapies. RCTs may miss clinically important benefits because of inadequate attention to sample size. The predictive value of each might be improved by reconsidering the use of p less than 0.05 as the significance level for all types of clinical trials, and by the use of confidence intervals around estimates of treatment effects. [Medline]

Impact of Cost Reduction Programs on Short-Term Patient Outcome and Hospital Cost of Total Knee Arthroplasty. William L. Healy, Richard Iorio, John Ko, David Appleby, David W. Lemos. J Bone Joint Surg Am 2002: 84(3); 348-353. Background: During the 1990s, cost reduction programs were developed to decrease the hospital cost of total knee arthroplasty. The purpose of this study was to evaluate the impact of hospital cost reduction programs for total knee arthroplasty on patient outcome at our hospital. Methods: We evaluated 159 patients who had undergone unilateral primary total knee arthroplasty for the treatment of osteoarthritis at the Lahey Clinic. The results of fifty-six knee replacements performed in 1992 without a clinical pathway or a knee-implant standardization program (the control group) were compared with the results of 103 knee replacements performed in 1995 with a clinical pathway and a knee-implant standardization program (the study group). Before the operation, the two patient populations were similar in terms of age, pain score on a visual analog scale, and clinical knee scores; the groups were also similar with regard to the surgical approach and the time in the operating room. The minimum duration of follow-up was eight years for the control group and five years for the study group. Results: All patients in both groups had excellent relief of pain and improvement in function. There were no differences in clinical outcome between the patient groups. The rate of patient satisfaction was 98% in the control group and 99% in the study group. Implementation of the clinical pathway was associated with a reduction in the average length of the stay in the hospital from 6.79 days in 1992 to 4.16 days in 1995. Implementation of the knee-implant standardization program was associated with increased use of all-polyethylene tibial components in 1995. Hospital cost adjusted for medical inflation was reduced 19% with the implementation of the clinical pathway and the knee-implant standardization program. Conclusions: The clinical pathway and the knee-implant standardization program reduced resource utilization and hospital cost for total knee arthroplasty without affecting short-term patient outcome in our hospital. Orthopaedic surgeons should carefully evaluate cost reduction programs, which may affect their patients, in order to maintain high-quality orthopaedic care and consistently successful patient outcomes. [Medline] [Abstract] [Full text] [PDF] A commentary on this article reviews the problems with historical controls: www.jbjs.org/Comments/2002/c_p_katz.shtml

A Challenge for HD Researchers. Ken Pidock, Huntington's Disease Advocacy Center. Accessed on 2003-06-20. "To those of us who have watched Huntington's Disease for more than a generation, news about actual clinical trials of potential therapies is most welcome. However, such news also carries issues concerning how such therapies can best be evaluated." www.hdac.org/features/article.php?p_articleNumber=32

The way forward for clinical research. Sir Michael Rawlins, Pharmafocus. Accessed on 2003-06-20. "Historical controls can be very useful, particularly where one is investigating otherwise untreatable conditions where there is a biologically plausible basis for the treatment, and where the outcome untreated is homogenous and either very disabling or fatal." Published June 2, 2003. www.pharmafile.com/Pharmafocus/Features/feature.asp?fID=354

Matching

Removal of radiation dose response effects: an example of over-matching. J. L. Marsh, J. L. Hutton, K. Binks. Bmj 2002: 325(7359); 327-30. [Medline] [Full text] [PDF]

Paired versus Two-Sample Design for a Clinical Trial of Treatments with Dichotomous Outcome: Power Considerations. S Wacholder, CR Weinberg. Biometrics 1982: 38(3); 801-812. ABSTRACT: For the same number of observations in a small-sample clinical trial with dichotomous outcome, the statistical power associated with a two-sample design, analyzed by Fisher's exact test, is slightly greater than that associated with a matched design, analyzed by McNemar's test, and hence of the matched design, is monotone increasing in the within-pair correlation between the treatment responses. Power curves are presented which demonstrate that positive within-pair correlation, even when quite small, can result in a superiority in power for the matched design. Conversely, in the rare situations where there is a negative within-pair correlation, choice of a two-sample design can result in a substantial gain in power.

Matching in epidemiology as a paradigm for twin research on the Etiology of Disease. C White. Acta Geneticae Medicae Et Gemellologiae 1981: 30(1); 77-86. Abstract not available.

Observational studies

A comparison of observational studies and randomized, controlled trials. K. Benson, A. J. Hartz. New England Journal of Medicine 2000: 342(25); 1878-86. BACKGROUND: For many years it has been claimed that observational studies find stronger treatment effects than randomized, controlled trials. We compared the results of observational studies with those of randomized, controlled trials. METHODS: We searched the Abridged Index Medicus and Cochrane data bases to identify observational studies reported between 1985 and 1998 that compared two or more treatments or interventions for the same condition. We then searched the Medline and Cochrane data bases to identify all the randomized, controlled trials and observational studies comparing the same treatments for these conditions. For each treatment, the magnitudes of the effects in the various observational studies were combined by the Mantel-Haenszel or weighted analysis-of-variance procedure and then compared with the combined magnitude of the effects in the randomized, controlled trials that evaluated the same treatment. RESULTS: There were 136 reports about 19 diverse treatments, such as calcium-channel-blocker therapy for coronary artery disease, appendectomy, and interventions for subfertility. In most cases, the estimates of the treatment effects from observational studies and randomized, controlled trials were similar. In only 2 of the 19 analyses of treatment effects did the combined magnitude of the effect in observational studies lie outside the 95 percent confidence interval for the combined magnitude in the randomized, controlled trials. CONCLUSIONS: We found little evidence that estimates of treatment effects in observational studies reported after 1984 are either consistently larger than or qualitatively different from those obtained in randomized, controlled trials.

Invited commentary: Rare side effects of obstetric interventions: Are observational studies good enough? P. Buekens. Am J Epidemiology 2001: 153(2); 108-9.

Systematic reviews and lifelong diseases. H. E. Elphick, A. Tan, D. Ashby, R. L. Smyth. Bmj 2002: 325(7360); 381-4. Systematic reviews of randomised controlled trials provide an evidence base for treatment but too often fail to give adequate information on long term outcomes. Elphick and colleagues discuss the limitations of the systematic review of randomised controlled trials for patients with chronic or lifelong diseases and suggest that long term observational studies have a place in the evaluation of the benefits and risks of treatment. [Full text] [PDF]

Statistics in Action. M.H. Gail. Journal of the American Statistical Association 1996: 91(433); 1-13. Abstract not available.

Research Fables from the Sisters Grinn, No. 1. The Hunch-test of Notre Dame.. Jeanne Grace, University of Rochester School of Nursing. Accessed on 2003-05-27. "Once upon a time in the land of Evidence, a sickly baby was born. His parents loved him and nursed him back to health and named him Quasi-experiment. As he grew, Quasi-experiment was unable to keep up with the other children. His physical challenges made him unable to compete in games of Manipulate the Independent Variable, and his strength was insufficient for random assignment tasks. While his schoolmates Randomized Clinical Trial and True Experiment received glowing praise for their accomplishments, Quasi- experiment received only disdain. The land of Evidence valued rigorous tests of causality above all else and had no tolerance for other investigative approaches. Saddened and isolated, Quasi-experiment withdrew from the company of others and came to live in the remote towers of the great cathedral of Evidence, Notre Dame." http://www.urmc.rochester.edu/SON/Fables/hunchbck.htm

Problems and approaches in investigating the role of micronutrients in the aetiology of cancer in humans. J. Little. Br Med Bull 1999: 55(3); 600-18. Observational studies have provided leads regarding a number of micronutrients which may account for the apparent protective effects of high intakes of vegetables and fruit against many types of cancer. In general, these leads have not been confirmed by randomised controlled trials. This apparent conflict raises issues about the timing and duration of a critical period or periods during which micronutrient intake may influence the development of cancer, the dose, possible interaction between high doses of micronutrients and exposures conferring a high risk of cancer and gene-micronutrient interactions. When gene-environmental interaction exists, failure to take both of these sets of factors into account leads to bias in the estimation of disease risk. As a result of recent advances, it is now possible to take measures of genetic susceptibility into account. Therefore, in future studies, the opportunity should be taken to obtain DNA samples to determine genotypes for polymorphisms potentially affecting micronutrient metabolism.

Interpreting the evidence: choosing between randomised and non-randomised studies. M McKee, A Britton, N Black, K McPherson, C Sanderson, C Bain. British Medical Journal 1999: 319(7205); 312-15. Abstract not available. [Medline] [Full text] [PDF]

High-dose vitamin C versus placebo in the treatment of patients with advanced cancer who have had no prior chemotherapy. A randomized double-blind comparison. C Moertel. New England Journal of Medicine 1985: 312(3); 137-141. ABSTRACT: It has been claimed that high-dose vitamin C is beneficial in the treatment of patients with advanced cancer, especially patients who have had no prior chemotherapy. In a double-blind study 100 patients with advanced colorectal cancer were randomly assigned to treatment with either high-dose vitamin C (10 g daily) or placebo. Overall, these patients were in very good general condition, with minimal symptoms. None had received any previous treatment with cytotoxic drugs. Vitamin C therapy showed no advantage over placebo therapy with regard to either the interval between the beginning of treatment and disease progression or patient survival. Among patients with measurable disease, none had objective improvement. On the basis of this and our previous randomized study, it can be concluded that high-dose vitamin C therapy is not effective against advanced malignant disease regardless of whether the patient has had any prior chemotherapy.

The arrogance of preventive medicine. D. L. Sackett. Cmaj 2002: 167(4); 363-4.

Humility in observational studies. J. D. Shelton. Science 2002: 297(5590); 2208. Abstract not available yet.

Fat chance: diet and ischemic stroke [editorial; comment]. R. Sherwin, T. R. Price. Jama 1997: 278(24); 2185-6. Abstract not available.

Smoking as "independent" risk factor for suicide: illustration of an artifact from observational epidemiology? G. D. Smith, A. N. Phillips, J. D. Neaton. Lancet 1992: 340(8821); 709-12. Two widely used criteria for determining whether an association between a risk factor and a disease is causal are dose response and independence from other factors. Data from a large US risk factor study (MRFIT) throw up a relation between cigarette smoking and suicide that meets these criteria, yet appears to be biologically implausible. It is likely that many more such associations, for other exposures and other diseases, are equally spurious, but are protected by their lack of obvious implausibility.

Epidemiology faces its limits. G. Taubes. Science 1995: 269(5221); p164-9. Abstract not available.

Randomization

Coronary artery surgery study (CASS): a randomized trial of coronary artery bypass surgery. Comparability of entry characteristics and survival in randomized patients and nonrandomized patients meeting randomization criteria. CASS Principal Investigators and Their Associates. Journal of the American College of Cardiology 1984: 3(1); 114-28. The Coronary Artery Surgery Study (CASS) includes a randomized trial of coronary artery bypass surgery and medical therapy in the management of patients with mild or moderate stable angina pectoris or free of angina but with a documented history of myocardial infarction. While 780 patients at 11 participating institutions entered the randomized trial, 1,315 patients at the same institutions met randomization criteria but declined participation in the randomized study; they constitute the "randomizable" patients. Half the randomized patients were assigned to surgery and half to the medical group. Of the 1,315 randomizable patients, 43% started with surgical therapy and 57% constitute the medical group. Follow-up periods average 64 months (range 46 to 92). The only entry characteristic in which the randomized and randomizable medical groups differ importantly is the extent of coronary artery disease, which is less extensive in the latter. The two surgical groups also differ in this respect, but with more extensive disease in the randomizable group. At 5 year follow-up, 24% of the medically-assigned randomized patients and 22% of the medically-started randomizable patients have had coronary bypass surgery. Survival in the medically-randomized and randomizable patient groups is similar in the aggregate (both 92% at 5 years) and also in all subgroups based on clinical classification, the number of diseased vessels, the presence of proximal left anterior descending coronary artery disease and ejection fraction. Survival for the surgically-assigned randomized patients and the surgically-started randomizable patients is also similar in the aggregate (95 and 94%, respectively) and in all subgroups. It is concluded that the randomized patients in CASS are not a special or atypical subset of those eligible for randomization. The data from the randomizable patients thus support and extend the inference of the generally very good survival of both the medically- and surgically-assigned patients of the randomized trial. [Medline]

Unconventional therapies and cancer. M. Begin, E. Kaegi. Cmaj 1999: 161(6); 686-7. Abstract not available yet.

Comparing like with like: some historical milestones in the evolution of methods to create unbiased comparison groups in therapeutic experiments. I. Chalmers. Int J Epidemiol 2001: 30(5); 1156-64. Histories of clinical trials have recorded and analysed the development of quantification in therapeutic evaluation, the emergence of probabilistic thinking, the application of statistical methods and theory, and the sociology, ethics and politics of clinical trials; but it is surprising that they only rarely identify as a distinct theme the development of efforts to control biases. An exception is Kaptchuk's recent account of the history of blinding and placebos for reducing observer biases. In this complementary paper I introduce and discuss some milestones between 1662 and 1948 in the development of methods to control selection biases when assembling therapeutic comparison groups, to ensure, as far as possible, that 'like is compared with like'. In the paper I note (i) that treatment allocation based on strict alternation abolishes selection bias as effectively as treatment allocation based on strict random allocation; (ii) that use of schedules based on random numbers is more likely to prevent foreknowledge of allocation schedules, and thus the risk of introducing selection bias at the point of recruitment to trials; (iii) that a concern to conceal allocation schedules was the rationale for using schedules based on random numbers in the Medical Research Council trials of vaccination for whooping cough and streptomycin for pulmonary tuberculosis; and (iv) that the introduction of allocation concealment more than half a century ago remains the most recent substantive milestone in the history of efforts to control selection biases in therapeutic experiments.

Randomized Controlled Trials: Evidence Biased Psychiatry. David Healy, Alliance for Human Research Protection. Accessed on 2002-"A new drug gets introduced to the market. It has been approved after stringent scrutiny by the FDA, which requires ever more convincing evidence that it works and that its safe. The new treatment will always cost more than the old treatments, but even on the cost front, many would argue that we have entered an era where placebo controlled clinical trials demonstrate that new in contrast to older treatments actually do work, and if we just stick to treatments that really work costs should fall. Besides it always seems to happen these days that when new and costly antidepressants or antipsychotics are put through an economic model based on the figures from clinical trials and a range of assumptions provided by experts, the model demonstrates that these new drugs costing thousand of dollars a year are in fact cheaper than treatments costing $100 per year or less. So where could the problems lie? Why do we seem to be so slow in reaching the new medical utopia towards which companies and others assure us we are heading?" www.researchprotection.org/COI/healy0802.html

Proof versus plausibility: rules of engagement for the struggle to evaluate alternative cancer therapies. L. J. Hoffer. Cmaj 2001: 164(3); 351-3.

Comparison of evidence of treatment effects in randomized and nonrandomized studies. J. P. Ioannidis, A. B. Haidich, M. Pappa, N. Pantazis, S. I. Kokori, M. G. Tektonidou, D. G. Contopoulos-Ioannidis, J. Lau. Jama 2001: 286(7); p821-30. CONTEXT: There is substantial debate about whether the results of nonrandomized studies are consistent with the results of randomized controlled trials on the same topic. OBJECTIVES: To compare results of randomized and nonrandomized studies that evaluated medical interventions and to examine characteristics that may explain discrepancies between randomized and nonrandomized studies. DATA SOURCES: MEDLINE (1966-March 2000), the Cochrane Library (Issue 3, 2000), and major journals were searched. STUDY SELECTION: Forty-five diverse topics were identified for which both randomized trials (n = 240) and nonrandomized studies (n = 168) had been performed and had been considered in meta-analyses of binary outcomes. DATA EXTRACTION: Data on events per patient in each study arm and design and characteristics of each study considered in each meta-analysis were extracted and synthesized separately for randomized and nonrandomized studies. DATA SYNTHESIS: Very good correlation was observed between the summary odds ratios of randomized and nonrandomized studies (r = 0.75; P<.001); however, nonrandomized studies tended to show larger treatment effects (28 vs 11; P =.009). Between-study heterogeneity was frequent among randomized trials alone (23%) and very frequent among nonrandomized studies alone (41%). The summary results of the 2 types of designs differed beyond chance in 7 cases (16%). Discrepancies beyond chance were less common when only prospective studies were considered (8%). Occasional differences in sample size and timing of publication were also noted between discrepant randomized and nonrandomized studies. In 28 cases (62%), the natural logarithm of the odds ratio differed by at least 50%, and in 15 cases (33%), the odds ratio varied at least 2-fold between nonrandomized studies and randomized trials. CONCLUSIONS: Despite good correlation between randomized trials and nonrandomized studies-in particular, prospective studies-discrepancies beyond chance do occur and differences in estimated magnitude of treatment effect are very common.

Discussion: Why Clinical Trials in the Evaluation of Life Style Evaluation? Genell L. Knatterud, PhD. Control Clinical Trials 1997: 18(6); 514-516. Abstract not available.

The Psychic Staring Effect: An Artifact of Pseudo Randomization. David F. Marks, John Colwell. Skeptical Inquirer 2000: 24(5); 41-44 and 49.

Evaluating complementary medicine: methodological challenges of randomised controlled trials. S. Mason, P. Tovey, A. F. Long. Bmj 2002: 325(7368); 832-4.

Issues to Consider When Designing RCTs for CAM Therapies. House of Lords United Kingdom Parliament. Accessed on 2002-12-23. www.parliament.the-stationery-office.co.uk/pa/ld199900/ldselect/ldsctech/123/12315.htm#a68

Difficulties of Randomised Controlled Trials. House of Lords United Kingdom Parliament. Accessed on 2002-12-31. "Concerns over RCTs distorting a therapy or disguising its efficacy are not the unique concerns of CAM practitioners. Vincent & Furnham suggest that as attempts to apply the RCT to a wider and wider range of treatments have occurred, more and more problems have been uncovered. They list 10 such problems." www.parliament.the-stationery-office.co.uk/pa/ld199900/ldselect/ldsctech/123/12323.htm

Sequential treatment assignment with balancing for prognostic factors in the controlled clinical trial. S. J. Pocock, R. Simon. Biometrics 1975: 31(1); 103-15. In controlled clinical trials there are usually several prognostic factors known or thought to influence the patient's ability to respond to treatment. Therefore, the method of sequential treatment assignment needs to be designed so that treatment balance is simultaneously achieved across all such patients factor. Traditional methods of restricted randomization such as "permuted blocks within strata" prove inadequate once the number of strata, or combinations of factor levels, approaches the sample size. A new general procedure for treatment assignment is described which concentrates on minimizing imbalance in the distributions of treatment numbers within the levels of each individual prognostic factor. The improved treatment balance obtained by this approach is explored using simulation for a simple model of a clinical trial. Further discussion centers on the selection, predictability and practicability of such a procedure.

Randomised block design is more powerful than minimisation [letter]. N. Ross. British Medical Journal 1999: 318(7178); 263-4. Abstract not available.

Patients' preferences and randomised trials. W. A. Silverman, D. G. Altman. Lancet 1996: 347(8995); p171-4.

Casting and Drawing Lots. W.A. Silverman, I. Chalmers. In: ed. Controlled Trials from History. By, I Chalmers, I. Milne, and U. Trohler. 2001; Vol.

Patient Heterogeneity in Clinical Trials. Richard Simon. Cancer Treatment Reports 1980: 64(2-3); 405-410. (Valuable comments on stratification and generalizability.) ABSTRACT: Interpretation of therapeutic results is complicated by variability in response among patients. This paper reviews fundamental statistical principles for the design of clinical trials. These methods seek to evaluate relative therapeutic efficacy in the presence of patient heterogeneity. Statistical science has more to offer therapeutics than significance tests among "comparable" treatment groups. The role of randomization and stratification is reviewed. The importance of study design, including patient eligibility and therapeutic standardization, to the generalization of conclusions is discussed.

Minimization: A new method of assigning patients to treatment and control groups. Donald R. Taves. Clinical Pharmacology and Therapeutics 1974: 15(5); 443-453. Abstract not available.

Use of unequal randomisation to aid the economic efficiency of clinical trials. David J Torgerson, Marion K Campbell. BMJ 2000: 321759. Abstract not available yet. [Full text] [PDF]

Minimisation: the platinum standard for trials? Randomisation doesn't guarantee similarity of groups; minimisation does [editorial] [see comments]. T Treasure, KD MacRae. BMJ 1998: 317(7155); 362-63. Abstract not available.

Minimisation is much better than the randomised block design in certain cases. Tom Treasure, KD MacRae. British Medical Journal 1999: 318(7195); 1420. Abstract not available.

Investigating Therapies of Potentially Great Benefit: ECMO. J.H. Ware. Statistical Science 1989: 4(4); 298-317.

Mammography and the politics of randomised controlled trials. J. Wells. Bmj 1998: 317(7167); 1224-9. [Full text] [PDF]

Randomised controlled trial of laparoscopic versus open mesh repair for inguinal hernia: outcome and cost. J. Wellwood, M. J. Sculpher, D. Stoker, G. J. Nicholls, C. Geddes, A. Whitehead, R. Singh, D. Spiegelhalter. Bmj 1998: 317(7151); 103-10. OBJECTIVE: To compare tension-free open mesh hernioplasty under local anaesthetic with transabdominal preperitoneal laparoscopic hernia repair under general anaesthetic. DESIGN: A randomised controlled trial of 403 patients with inguinal hernias. SETTING: Two acute general hospitals in London between May 1995 and December 1996. SUBJECTS: 400 patients with a diagnosis of groin hernia, 200 in each group. Main outcome measures: Time until discharge, postoperative pain, and complications; patients' perceived health (SF-36), duration of convalescence, and patients' satisfaction with surgery; and health service costs. RESULTS: More patients in the open group (96%) than in the laparoscopic group (89%) were discharged on the same day as the operation (chi2 = 6.7; 1 df; P=0.01). Although pain scores were lower in the open group while the effect of the local anaesthetic persisted (proportional odds ratio at 2 hours 3.5 (2.3 to 5.1)), scores after open repair were significantly higher for each day of the first week (0.5 (0.3 to 0.7) on day 7) and during the second week (0.7 (0.5 to 0.9)). At 1 month there was a greater improvement (or less deterioration) in mean SF-36 scores over baseline in the laparoscopic group compared with the open group on seven of eight dimensions, reaching significance on five. For every activity considered the median time until return to normal was significantly shorter for the laparoscopic group. Patients randomised to laparoscopic repair were more satisfied with surgery at 1 month and 3 months after surgery. The mean cost per patient of laparoscopic repair was 335 pounds (95% confidence interval 228 pounds to 441 pounds) more than the cost of open repair. CONCLUSION: This study confirms that laparoscopic hernia repair has considerable short term clinical advantages after discharge compared with open mesh hernioplasty, although it was more expensive.

The protective effect of auto-immune buccal urine therapy (AIBUT) against the Raynaud phenomenon. C. W. Wilson. Med Hypotheses 1984: 13(1); 99-107. The efficacy of Auto-Immune Buccal Urine Therapy (AIBUT) against allergic symptoms depends upon sublingual administration of the correct dose of urine as determined by bio-assay in individual patients. Succeeding effective turn-off doses occur at the troughs of a sinusoidal dose-response curve. Efficacy of the administered dose is confirmed by reduction in the severity and duration of Cold-water-induced Raynaud symptoms after administration of effective doses of unboiled urine in AIBUT. Boiled urine does not affect the Raynaud phenomenon.

Randomised controlled trials in primary care: case study. Sue Wilson. British Medical Journal 2000: 32124-27. Abstract not available yet. [Medline] [Full text] [PDF]

A new design for randomized clinical trials. M. Zelen. N Engl J Med 1979: 300(22); p1242-5. This paper proposes a new method for planning randomized clinical trials. This method is especially suited to comparison of a best standard or control treatment with an experimental treatment. Patients are allocated into two groups by a random or chance mechanism. Patients in the first group receive standard treatment; those in the second group are asked if they will accept the experimental therapy; if they decline, they receive the best standard treatment. In the analyses of results, all those in the second group, regardless of treatment, are compared with those in the first group. Any loss of statistical efficiency can be overcome by increased numbers. This experimental plan is indeed a randomized clinical trial and has the advantage that, before providing consent, a patient will know whether an experimental treatment is to be used.

The randomization and stratification of patients to clinical trials. M. Zelen. Journal of Chronic Diseases 1974: 27(7-8); 365-75. Abstract not available yet.

Single case design

Single-case Reseach Designs for the Science and Practice of Neurotherapy. Neville Blampied, Arreed Barabasz, Marianne Barabasz. Journal of Neurotherapy 1996: 1(4); The dominant research tradition in psychology and psychiatry requires that numbers of subjects be randomly allocated to form treatment groups. Treatment effects typically are assessed by testing hypotheses about group mean differences. This paradigm seriously inhibits the implementation of the scientist-practitioner model embraced by practitioners of neurotherapy, stifles innovation and precludes the scientific investigation of the exceptional or novel case. Single-case research designs make it possible to draw scientifically valid conclusions from the investigation and treatment of individual cases. The key elements of these designs are outlined and particular designs of potential utility to neurotherapy are discussed.

Older references (check for inclusion among newer references)

A controlled trial of immunotherapy for asthma in allergic children. N. F. Adkinson, Jr., P. A. Eggleston, D. Eney, E. O. Goldstein, K. C. Schuberth, J. R. Bacon, R. G. Hamilton, M. E. Weiss, H. Arshad, C. L. Meinert, J. Tonascia, B. Wheeler. New England Journal of Medicine 1997: 336(5); 324-31. [Abstract] [Full text] [PDF]

Controlled trial of acupuncture for severe recidivist alcoholism. M. L. Bullock, P. D. Culliton, R. T. Olander. Lancet 1989: 1(8652); 1435-9.

The orthomolecular treatment of cancer. II. Clinical trial of high-dose ascorbic acid supplements in advanced human cancer. E. Cameron, A. Campbell. Chem Biol Interact 1974: 9(4); 285-315.

A case-control study of HIV seroconversion in health care workers after percutaneous exposure. Centers for Disease Control and Prevention Needlestick Surveillance Group. D. M. Cardo, D. H. Culver, C. A. Ciesielski, P. U. Srivastava, R. Marcus, D. Abiteboul, J. Heptonstall, G. Ippolito, F. Lot, P. S. McKibben, D. M. Bell. N Engl J Med 1997: 337(21); 1485-90. [Abstract] [Full text] [PDF]

Statistics in Action. M.H. Gail. Journal of the American Statistical Association 1996: 91(433); 1-13.

Dietary Fat Intake and the Risk of Coronary Heart Disease in Women. Frank B. Hu, Meir J. Stampfer, JoAnn E. Manson, Eric Rimm, Graham A. Colditz, Bernard A. Rosner, Charles H. Hennekens, Walter C. Willett. N Engl J Med 1997: 337(21); 1491-1499.  [Abstract] [Full text] [PDF]

Removal of radiation dose response effects: an example of over-matching. J. L. Marsh, J. L. Hutton, K. Binks. Bmj 2002: 325(7359); 327-30. [Medline] [Full text] [PDF]

Observational Studies. PR Rosenbaum (1995) New York: Springer-Verlag.

Evidence-based medicine and treatment choices. D. L. Sackett. Lancet 1997: 349(9051); 570; discussion 572-3. [Medline]

Fat chance: diet and ischemic stroke [editorial; comment]. R. Sherwin, T. R. Price. Jama 1997: 278(24); 2185-6.

Where's the Evidence? Debates in Modern Medicine. William A. Silverman (1998) New York: Oxford University Press.

This webpage was written by Steve Simon on 2003-06-20, edited by Steve Simon, and was last modified on 2008-07-08. Send feedback to ssimon at cmh dot edu or click on the email link at the top of the page. Category: Statistical evidence