November lectures

Dear Dr. Hanley.

Just a few quick questions regarding your epidemiology lecture to our med2 class

Re: Standard Error. As I understand it, the standard error is inversely related to the square of the n


inversely related to the square ROOT of the n

see eqns. on page 2 of notes, left column ...

Thus, if you increase the n, for example, from 4 to 8,
does this mean the SE is reduced by the square of 4?


it is reduced by a factor of sqrt[2] = 1.41 !

It goes from proportional to 1/sqrt[4] to proportional to 1/sqrt[8]

that is from 1/2 = 0.5 to 1/sqrt[8]= 0.353

that is NOT reduced by the square of 4 !

it IS reduced by the square ROOT of 2 !

If you went from 4 to 16 (quadruple), you would reduce it by a factor of 2

ie from 1/sqrt[4] to 1/sqrt[16]

ie from 0.5 to 0.25! that's a factor of 2 !

Do I have this right 'As you increase the n, you decrease the SE (by the square of n?) ',

YES you decrease the SE , BUT not in the way you say it..

SE if use n
is proportional to 1/sqrt[n] (1)

SE if use k times n
is proportional to square root of (k times n )
ie proportional to a value which is sqrt[k] times smaller than (1)

which decreases the margin of error,


which decreases the span of the CI.


Remember my example on last page of my handout

n=1000 (all canada) margin of error = 3 percentage points

n= 250 (Quebec only) margin of error = 6 percentage points

quebec sample size is 4 times smaller
so SE (and thus the margin of error in a 95% CI, namely 1.96 x SE)
is sqrt[4] = 2 times larger

Just a quick follow up: If the span of the CI is smaller, does this mean
that the confidence in this CI is less?

YES (IF..)

IF you had to use a smaller multiple
(eg 1.645 rather than 1.96) of the SAME SE to get
a smaller margin of error, then you CAN'T have as much
confidence in it

how about i ask you to estimate my age to within +/- 10 years

HOW much, with same info, would you be willing to bet that your interval is correct?

What if I said you must estimate it to within +/- 1 year?

Would you be willing to bet as much?

if i ask you for a CI for the average salary of all Quebec md's
and I ask you to give a number +/- $1000
you (or i ) won't have as much confidence in that interval
as if you could use a margin of error of say +/- $10000.

BUT IF you increase you sample size (n) so as to DECREASE the SE
BUT kept the multiplier (eg 1.96) the SAME
then your confidence (95%) is the same..
but you now have a smaller margin of error...

that is a bit like asking me some more info
like when i got my PhD and what my BP is
and the age of one of my sibs ! more info
lets you narrow the interval,
while keeping the same degree of confidence
ie you would now bet same amount on narrower interval

or keep the same interval but have more confidence (bet more)

the degree of confidence is something you set to suit
the demands of the situation and unless you can change
one of the basic inputs (such as the amount of info)
you must settle for less confidence in narrower intervals
or more confidence in wider ones

Next, how's this for my definition of 95% CI:

CI, constructed in this way, contain the true value
95% of the time, and donít 5% of the time. We don't
know which one though!



How do you get 1.96 for 95%CI. Ie. What is it for 90%?

In my notes I said it was 1.645 for 90%

These values come from the Gaussian (Normal) distribution

we know that 95% of the values in a Gaussian (Normal) disrn. fall within 1.96 SD's of the mean

we know that 90% of the values in a Gaussian (Normal) disrn. fall within 1.645 SD's of the mean

we know that 68% of the values in a Gaussian (Normal) disrn. fall within 1.0 SD's of the mean


these are tabulated and available on calculators

and before the euro, the equation of the normal curve used to be on the German 10 mark note along with a portrait of Gauss

this distribution is usually used for individual values (eg heights)

BUT it also applied to STATISTICS (ie numbers calculated from aggregate
ie samples)

so it is the same business.. except we use SE to describe the
variability of a statistic.

I would be grateful for any feedback on this or the lectures..

J Hanley (James.Hanley@McGill.CA)


September lectures...

Dear Dr. Hanley.

I am a student in the med/dent 2 class. I have two questions I would like to ask you.

At the beginning of todays lecture (Friday), you gave us material that was not addressed in the lecture. As you know, the material was about statistics. I read it, and had great diffeculty understanding the concepts. My question is, will you explain this material in subsequent lectures, or would you like us to understand it now for the upcoming exam?

agree that difficult without me to explain it in person -- not suitable for review by yourself..

I will indeed have to go over it again in nov and so it won't be on the upcoming exam

[just wasnt enough time to get thru all this in 5 hrs]

Am I correct with the following terminology/concepts:

-experimental: investigator selects and allocates experiment groups.


ie it can't be just in course of usual clinical activities
(where investiagtor could also select and allocate.. )

-non-experimental: investigator just "watches" groups and has no abilit to allocate groups.

CORRECT .. subjects select their own lifestyle, environment, etc.. (or luck and their parents pass on certain genes or blood groups etc to them!)

that's why some people call them "observational" studies [they would be better to say "observation only" studies since all studies (expt'l and non-) involve observation ! ]

Thus, not as great as comparing "like with like"

correct.. there can be many factors that differ b/w the group "exposed" to the "agent" of interest and that
could distort the comparison

(unless investigator is VERY FORTUNATE -- as was John Snow)

-cohort experiment:

People don't say "cohort EXPERIMENT"

cohort is a group that is followed up (a clin trial has at least 2 such groups, and IS experimental...)

BUT there are many cohorts whose "exposure" is NOT allocated by the investigator
(as you correctly state above, to be an experiment, it is the investigator who allocates the Rx)

ie most cohort studies are NON-EXPERIMENTAL and indeed some see cohorts as a subtype of
NONEXPERIMENTAL studies If you read the excerpt from Rothman and Greenland, you will see that they put cohort studies as a subtype of NON_EXPERIMENTAL studies.

the main thing that does defines the cohort is the "start with denominator.. attitude". There is nothing about the definition that says who put which persons in the "exposed" subgroup, and which in the "unexposed" subgroup

-- it could be the subjects themselves (non-exptl) OR the investigator (experimental eg clin or field or community trial)

Comparison between groups. Denominator selected first. It is non-experimental meaning that the investigator does not allocate people to groups, but, the efficiency can be increased by the investigator comparing appropriate groups (e.g. residence vs. spouces of residence)

YES, but rather than efficiency I would say the distortions can be reduced by selecting and comparing "otherwise similar" groups..

PS: we were talking of medical resideNTS and spouSes of residenTS (see abstract on web page)

-case control:

Comparison between groups.

careful here.. CONCEPTUALLY we always compare rates in the "exposed" and the "not exposed"

even if we have to use quasi-denomiinators to do so ( and limit ourselves to RATIOS of rates)

Numerator selected first.

YES (ie start with the cases of the disease / outcome of interest)

then sample from the "base" that generated the cases in order to estimate the relative sizes of the "exposed" denominator and the "unexposed denominator"

[denominator series is usually called "control series" but as I and others point out, the word "control" can be quite confusing here ]

it is much more descriptive to refer to the sample used to estimate the relative sizes of the 2 denominators as the "denominator series"

It is nonexperiemental.

YES case-ctl studies are ALWAYS NON-EXPTL

if one STARTS with numerators (ie cases) then by definition it is already after the fact, and (presumably) the first time that the investigator has "come on the scene" .. so investigator could not even have been present earlier on when the subjects choose their "exposure".