Courses EPIB-634 and BIOS601/602

Resources for Case-Control Studies (C&H Ch 16,17,18,19)

( BIOS601/602)

Updated: Feb 19, 2015

Book Chapters, articles, notes etc..

  • In Case-control Studies, (ratios of) rates are based on ESTIMATED denominators.

    There is extra variance to pay for estimating the denominators, but otherwise the Rate Ratio estimator has the same form as the one where the denominators are known:
       Number of exposed cases           Number of unexposed cases
    -----------------------------   /  ------------------------------
    Proxy for exposed denominator      Proxy for unexposed denominator
    The arithmetic can be shortened to a ratio of two crossproducts, with 'c' denoting the Proxy for exposed denominator, and 'd' the Proxy for unexposed denominator = 'd', both obtained from the sample of the base (the denominator sample, i.e., the 'control' series) that generated the exposed and unexposed cases,
    (Number of exposed cases) x (Proxy for unexposed denominator)    a d
    ------------------------------------------------------------- =  ----
    (Number of unexposed cases) x (Proxy for exposed denominator)    b c
    BUT that doesn't mean it has to be called an 'odds' ratio. Call it what it really is, an estimate of the Rate Ratio

    Examples of denominator ('control') series
    -- start with the 'to beat hockey goalie Patrick Roy, shoot low!' example. Then progress to the 'Rubella in pregnancy and congenital malformations' and 'Diethylstilbesterol (DES) in pregnancy' examples. See also the 2 John Snow examples: South London, and the Soho (Broad Street) district.
    In some instances (e.g., the rubella example) there is no explicit denominator series. Rather, the authors invited readers to imagine their own, from general knowledge. Thus, knowledge about menstruation raised physicians' suspicions when confronted with cases of toxic shock.
    Of 40 patients in whom a menstrual history was obtained, 38 (95%) had onset of illness within the 5-day period following onset of menses. Two others had onset of illness 10 days after onset of menses. Moreover, 13 patients have had recurrence of symptoms with a subsequent menstrual period.
    and led them to report the cases to the Centers for Disease Control (CDC).
    See their Morbidity and mortality weekly report, Vol. 29, no. 20, May 23, 1980
    In the Broad Street pump study, Snow had a 'case series' but did not have a (formal) denominator series, and indeed his critics faulted him for that. The Reverend Henry Whitehead, who was to become his colleague, and who found the 'index case' (the water from her cholera-soiled diapers found its way into the pump water), did use a 'denominator series'. In the South London study, Snow had a 'case series' of 300 who died of cholera in the first 4 weeks of the epidemic: D_1 = 286 had been customers of the Southwark and Vauxhall Company -- which took its water from the downtown river Thames -- , and D_0 = 14 had been customers of the Lambeth Company -- which took its water from the upriver river Thames. By good fortune, he was able to use Government-obtained denominators of DENOM_1 = 26,107 and DENOM_0 = 40,046 homes. So his shoe-leather epidemiology was limited to classifying the case series of D = 300 into the 2 numerators 286 and 14. Had the denominators not been available, he would have been forced to take a sample (denominator series) of the 66,153 homes and classify them so as to provide an estimate of the DENOM_1:DENOM_0 ratio. Even for a series of say 600 randomly sampled homes, this would have been quite a lot of work, as it was not easy to determine the water provider.
  • A difficult-to-estimate denominator
    Marriages between first cousins England and their effects.
    George H Darwin. J Royal Statistical Society, Vol 38, June 1875.   link

    (A small follow-up study, reported in the JRSS later that year, is appended)
    A slightly different version of the main article was published in the Fortnightly Review the same year. It was re-printed in 2009 in the
    International J of Epidemiology, along with several commentaries
  • Friedman's PRIMER of EPIDEMIOLOGY link

    Chapter 7, example 2, page 98, has a particularly engaging example of 'shoe-leather' epidemiology and an unusual 1959 'case-control' study "Pedestrians Fatally Injured by Motor Vehicles" in New York City.

  • Woolf B. On Estimating the Relation Between Blood Group and Disease. Annals of Human Genetics;19:251-253, 1955. link

    This is an early, and very enlightened, example of how to properly think about the real purpose of 'controls' in the so-called 'case-control' study. The true purpose is to derive estimates of the relative sizes of DENOMINATORS for the numbers of cases in the index and reference categories of 'exposure' -- just as our sampling of shots on goal does when we do not have time to classify ALL of them. Indeed, the words
    This ... is avoided if one works with [COMPARES!] incidence rates in the various blood [EXPOSURE] groups. The data usually do not permit calculation of absolute rates, nor are they needed [!!!]. What is wanted and readily obtained is an estimate of the ratio of one rate to another [THAT WAS 1955!! 60 YEARS LATER, PEOPLE PERSIST IN COMPARING CASES WITH CONTROLS, something Miettinen says 'is not done in better families']. The incidence in group A will be (h/H) x some constant, and that in group B will be (k/K) x the same constant. If the ratio is taken as [theta.hat] to 1, an estimate of [theta] will be hK/Hk, and it may readily be shown that this is the maximum-likelihood estimate.
    anticipate by 2 decades the approach promoted by Miettinen in his seminal 1976 paper that tried to get rid of the 'rare-disease' assumption. Woolf says we should always compare rates in the exposed and the unexposed. Miettinen, in the 2011 interview ( Video   Audio ) with JH, is frustrated by the archaic, but still common, approach of COMPARING (exposure histories) in CASES vs. CONTROLS, instead of COMPARING event RATES in the EXPOSED person-time with those in the UNEXPOSED person-time.

  • Chapters   16   17   18   from Clayton and Hills.
  • Protective efficacy of BCG against leprosy in Northern Malawi Link
  • Mantel N, Haenszel W. Statistical Aspects of the Analysis of Data From Retrospective Studies of Disease. J. Nat. Cancer Inst. 22: 719-748, 1959.
    part2   part3
  • Origins and early development of the case-control study: part 1, Early evolution.   Link
  • Latter half of 20th century:   Link
  • Woolf55, Mantel73, Miettinen76 and Miettinen2011Interview(audio). link
    This link (to a course for PhD students in epidemiology) also has other material on epidemiologic concepts, case-control studies, and other methodological issues. The 3 Wacholder papers are valued by epidemiology students preparing for their comprehensive exam.

  • Liddell, McDonald Thomas, JRSSA. 1977 link
  • Induced Abortions -> Ectopic Pregnancy [data in Table 1, page 350]
  • Induced Abortions -> Secondary Infertility
  • Induced Abortions -> Breast Cancer
      Dutch Case Control Study
      Danish Study Based on Vital and Heath Care Databases

  • MI and Vasectomy
      Documentation, Data, R code

  • Redelmeier. Road Trauma in Teenage Male Youth with Childhood Disruptive Behavior Disorders link

Software and Data

  • R code   to count BirthMonths of hockey players   {if you have a more elegant way to match ONE of the provinces in the list, JH would like to use and disseminate it}
    Annual numbers of live births, by month, in Canada provinces and territories, 1991-2007 .csv

  • R code for
      (a) analyses of data in Table 1 of Woolf(1955) and

      (b) calculation of statistical efficiency of 'case-control' studies with various ratios
            of the size of the denominator-series to the size of the numerator-series.

  • R code for
      analyses of bladder cancer-smoking data in Table 1 of Miettinen(1976)
      Link to Mantel & Haenszel 1959 and other classics

  • R code (and numerators) for
      study of rates of reaching NHL
      Data: Numbers of Births by Month, Canada (.csv)
      Data on current Canadian   MPs (.csv)     Senators (.csv)

  • Article   Are There Two Logistic Regressions for Retrospective Studies?
                N. Breslow and W. Powers, Biometrics, Vol. 34, No. 1 (Mar., 1978), pp. 100-105
    Dataset Oxford Childhood Cancer Survey
                  (plus other datasets used in, and pdfs of, Breslow & Day Vols I & II.).

  • R Code   for conditional MLE of odds ratio, based on non-central hypergeometric model [Fisher's data on twins and criminality]