Resources for CaseControl Studies (C&H Ch 16,17,18,19)
( BIOS601/602)
Updated: Feb
19, 2015
Book Chapters, articles, notes etc..
 In Casecontrol Studies,
(ratios of) rates are based on ESTIMATED denominators.
There is extra variance to pay for estimating the denominators, but otherwise the Rate Ratio estimator has the
same form as the one where the denominators are known:
Number of exposed cases Number of unexposed cases
 / 
Proxy for exposed denominator Proxy for unexposed denominator
The arithmetic can be shortened to a ratio of two crossproducts, with 'c' denoting the Proxy for exposed denominator,
and 'd' the Proxy for unexposed denominator = 'd', both obtained from the
sample of the base (the denominator sample, i.e., the 'control' series) that generated the exposed and unexposed cases,
(Number of exposed cases) x (Proxy for unexposed denominator) a d
 = 
(Number of unexposed cases) x (Proxy for exposed denominator) b c
BUT that doesn't mean it has to be called an 'odds' ratio.
Call it what it really is, an estimate of the Rate Ratio
Examples of denominator ('control') series
 start with the
'to beat hockey goalie Patrick Roy, shoot low!' example.
Then progress to the 'Rubella in pregnancy and congenital malformations' and
'Diethylstilbesterol (DES) in pregnancy' examples. See also the 2 John Snow examples:
South London, and the Soho (Broad Street) district.
In some instances (e.g., the rubella example) there is no explicit denominator series.
Rather, the authors invited readers to imagine their own, from general knowledge.
Thus, knowledge about menstruation raised physicians' suspicions when
confronted with cases of toxic shock.
Of 40 patients in whom a menstrual history was obtained,
38 (95%) had onset of illness within the 5day period following onset of menses.
Two others had onset of illness 10 days after onset of menses.
Moreover, 13 patients have had recurrence of symptoms with a subsequent menstrual period.
and led them to report the cases to the Centers for Disease Control (CDC).
See their Morbidity and mortality weekly report, Vol. 29, no. 20, May 23, 1980
In the Broad Street pump study, Snow had a 'case series' but did not have a (formal) denominator series,
and indeed his critics faulted him for that. The Reverend Henry Whitehead, who was to become his colleague,
and who found the 'index case' (the water from her cholerasoiled diapers found its way into the pump water),
did use a 'denominator series'.
In the South London study, Snow had a 'case series' of 300 who died of cholera in the first 4 weeks
of the epidemic: D_1 = 286 had been customers of the Southwark and Vauxhall Company  which took its water
from the downtown river Thames  , and D_0 = 14 had been customers of the Lambeth Company  which took its water
from the upriver river Thames. By good fortune, he was able to use
Governmentobtained denominators of DENOM_1 = 26,107 and DENOM_0 = 40,046 homes.
So his shoeleather epidemiology was limited to classifying the case series of D = 300 into the 2 numerators
286 and 14. Had the denominators not been available, he would have been forced
to take a sample (denominator series) of the 66,153 homes and classify them so as
to provide an estimate of the DENOM_1:DENOM_0 ratio. Even for a series of say 600 randomly sampled
homes, this would have been quite a lot of work, as it was not easy to determine the water provider.
 A difficulttoestimate denominator
Marriages between first cousins England and their effects.
George H Darwin. J Royal Statistical Society, Vol 38, June 1875.
link
(A small followup study, reported in the JRSS later that year, is appended)
A slightly different version of the main article was published in the Fortnightly Review the same year. It
was reprinted in 2009 in the
International J of Epidemiology, along with several commentaries
 Friedman's PRIMER of EPIDEMIOLOGY
link
Chapter 7, example 2, page 98, has a particularly engaging example of 'shoeleather'
epidemiology and an unusual 1959 'casecontrol' study "Pedestrians Fatally Injured by Motor Vehicles" in New York City.
 Woolf B. On Estimating the Relation Between Blood Group and Disease.
Annals of Human Genetics;19:251253, 1955.
link
This is an early, and very enlightened, example of how to properly think about
the real purpose of 'controls' in the socalled 'casecontrol' study. The true
purpose is to derive estimates of the relative sizes of DENOMINATORS for the numbers of cases
in the index and reference categories of 'exposure'  just as our sampling of shots
on goal does when we do not have time to classify ALL of them. Indeed,
the words
This ... is avoided if one works with [COMPARES!] incidence rates in the various
blood [EXPOSURE] groups. The data usually do not permit calculation
of absolute rates, nor are they needed [!!!]. What is wanted and readily obtained is an estimate of
the ratio of one rate to another [THAT WAS 1955!! 60 YEARS LATER, PEOPLE PERSIST IN COMPARING CASES WITH CONTROLS,
something Miettinen says 'is not done in better families']. The incidence in
group A will be (h/H) x some constant,
and that in group B will be (k/K) x the same constant. If the ratio is taken as [theta.hat] to 1,
an estimate of [theta] will be hK/Hk, and it may readily be shown that this is
the maximumlikelihood estimate.
anticipate by 2 decades the approach promoted by Miettinen in his
seminal 1976 paper that tried to get rid of the 'raredisease' assumption. Woolf says we should always
compare rates in the exposed and the unexposed. Miettinen, in the
2011 interview ( Video
Audio )
with
JH, is frustrated by the archaic, but still common, approach of COMPARING (exposure histories) in
CASES vs. CONTROLS, instead of COMPARING event RATES in the
EXPOSED persontime with those in the UNEXPOSED persontime.
 Chapters
16
17
18
from Clayton and Hills.

Protective efficacy of BCG against leprosy in Northern Malawi
Link
 Mantel N, Haenszel W.
Statistical Aspects of the Analysis of Data From Retrospective
Studies of Disease.
J. Nat. Cancer Inst. 22: 719748, 1959.
part1
part2
part3
 Origins and early development of the casecontrol study:
part 1, Early evolution.
Link
 Latter half of 20th century:
Link
 Woolf55, Mantel73, Miettinen76 and Miettinen2011Interview(audio).
link
This link (to a course for PhD students in epidemiology) also has other material on epidemiologic
concepts, casecontrol studies, and other methodological issues. The 3 Wacholder papers are valued by epidemiology students preparing for their comprehensive exam.
 Liddell, McDonald Thomas, JRSSA. 1977
link

Induced Abortions > Ectopic Pregnancy [data in Table 1, page 350]

Induced Abortions > Secondary Infertility
 Induced Abortions > Breast Cancer
Dutch Case Control Study
Danish Study Based on Vital and Heath Care Databases
 MI and Vasectomy
Documentation, Data, R code
 Redelmeier. Road Trauma in Teenage Male Youth with Childhood Disruptive Behavior Disorders
link
Software and Data
 R code
to count BirthMonths of hockey players
{if you have a more elegant way
to match ONE of the provinces in the list, JH would like to use and disseminate it}
Annual numbers of live births, by month, in Canada provinces and territories, 19912007
.csv

R code for
(a) analyses of data in Table 1 of Woolf(1955) and
(b) calculation of statistical efficiency of 'casecontrol' studies with various ratios
of the size of the denominatorseries to the size of the numeratorseries.

R code for
analyses of bladder cancersmoking data in Table 1 of
Miettinen(1976)
Link to
Mantel & Haenszel 1959 and other classics

R code (and numerators) for
study of rates of reaching NHL
Data: Numbers of Births by Month, Canada (.csv)
Data on current Canadian MPs (.csv)
Senators (.csv)

Article
Are There Two Logistic Regressions for Retrospective Studies?
N. Breslow and W. Powers, Biometrics, Vol. 34, No. 1 (Mar., 1978), pp. 100105
Dataset Oxford Childhood Cancer Survey
(plus other datasets used in, and pdfs of, Breslow & Day Vols I & II.).
 R Code
for conditional
MLE of odds ratio, based on noncentral hypergeometric model [Fisher's data on twins and criminality]
