November lectures
Dear Dr. Hanley.
Just a few quick questions regarding your epidemiology lecture to our med2 class.
Re: Standard Error. As I understand it, the standard error is inversely related
to the square of the n
NOT CORRECT
inversely related to the square ROOT of the n
see eqns. on page 2 of notes, left column ...
Thus, if you increase the n, for example, from 4 to 8,
does this mean the SE is reduced by the square of 4?
NO,
it is reduced by a factor of sqrt[2] = 1.41 !
It goes from proportional to 1/sqrt[4] to proportional to 1/sqrt[8]
that is from 1/2 = 0.5 to 1/sqrt[8]= 0.353
that is NOT reduced by the square of 4 !
it IS reduced by the square ROOT of 2 !
If you went from 4 to 16 (quadruple), you would reduce it by a factor of 2
ie from 1/sqrt[4] to 1/sqrt[16]
ie from 0.5 to 0.25! that's a factor of 2 !
Also,
Do I have this right 'As you increase the n, you decrease the SE (by the square of
n?) ',
YES you decrease the SE , BUT not in the way you say it..
SE if use n
is proportional to 1/sqrt[n] (1)
SE if use k times n
is proportional to square root of (k times n )
ie proportional to a value which is sqrt[k] times smaller than (1)
which decreases the margin of error,
INDEED!
which decreases the span of the CI.
INDEED!
Remember my example on last page of my handout
n=1000 (all canada) margin of error = 3 percentage points
n= 250 (Quebec only) margin of error = 6 percentage points
quebec sample size is 4 times smaller
so SE (and thus the margin of error in a 95% CI, namely 1.96 x SE)
is sqrt[4] = 2 times larger
Just a quick follow up: If the span of the CI is smaller, does this mean
that the confidence in this CI is less?
YES (IF..)
IF you had to use a smaller multiple
(eg 1.645 rather than 1.96) of the SAME SE to get
a smaller margin of error, then you CAN'T have as much
confidence in it
how about i ask you to estimate my age to within +/- 10 years
HOW much, with same info, would you be willing to bet that your interval is correct?
What if I said you must estimate it to within +/- 1 year?
Would you be willing to bet as much?
if i ask you for a CI for the average salary of all Quebec md's
and I ask you to give a number +/- $1000
you (or i ) won't have as much confidence in that interval
as if you could use a margin of error of say +/- $10000.
BUT IF you increase you sample size (n) so as
to DECREASE the SE
BUT kept the multiplier (eg 1.96) the SAME
then your confidence (95%) is the same..
but you now have a smaller margin of error...
that is a bit like asking me some more info
like when i got my PhD and what my BP is
and the age of one of my sibs ! more info
lets you narrow the interval,
while keeping the same degree of confidence
ie you would now bet same amount on narrower interval
or keep the same interval but have more confidence (bet more)
the degree of confidence is something you set to suit
the demands of the situation and unless you can change
one of the basic inputs (such as the amount of info)
you must settle for less confidence in narrower intervals
or more confidence in wider ones
Next, how's this for my definition of 95% CI:
CI, constructed in this way, contain the true value
95% of the time, and donít 5% of the time. We don't
know which one though!
EXCELLENT!
Finally,
How do you get 1.96 for 95%CI. Ie. What is it for 90%?
In my notes I said it was 1.645 for 90%
These values come from the Gaussian (Normal) distribution
we know that 95% of the values in a Gaussian (Normal) disrn. fall within 1.96 SD's
of the mean
we know that 90% of the values in a Gaussian (Normal) disrn. fall within 1.645 SD's of the mean
we know that 68% of the values in a Gaussian (Normal) disrn. fall within 1.0
SD's of the mean
etc
these are tabulated and available on calculators
and before the euro, the equation of the normal curve used to be on the German 10 mark note along with a portrait of Gauss
this distribution is usually used for individual values (eg heights)
BUT it also applied to STATISTICS (ie numbers calculated from aggregate
ie samples)
so it is the same business.. except we use SE to describe the
variability of a statistic.
I would be grateful for any feedback on this or the lectures..
J Hanley (James.Hanley@McGill.CA)
=====================
September lectures...
Dear Dr. Hanley.
I am a student in the med/dent 2 class. I have two questions I would like to ask
you.
1.
At the beginning of todays lecture (Friday), you gave us material that was not addressed
in the lecture. As you know, the material was about statistics. I read it, and had
great diffeculty understanding the concepts. My question is, will you explain this
material in subsequent lectures, or would you like us to understand it now for the
upcoming exam?
agree that difficult without me to explain it in person -- not suitable for review
by yourself..
I will indeed have to go over it again in nov and so it won't be on the upcoming
exam
[just wasnt enough time to get thru all this in 5 hrs]
2.
Am I correct with the following terminology/concepts:
-experimental: investigator selects and allocates experiment groups.
YES.. AND WITH A VIEW TO LEARNING SOMETHING..
ie it can't be just in course of usual clinical activities
(where investiagtor could also select and allocate.. )
-non-experimental: investigator just "watches" groups and has no abilit
to allocate groups.
CORRECT .. subjects select their own lifestyle, environment, etc.. (or luck and their
parents pass on certain genes or blood groups etc to them!)
that's why some people call them "observational" studies [they would be
better to say "observation only" studies since all studies (expt'l
and non-) involve observation ! ]
Thus, not as great as comparing "like with like"
correct.. there can be many factors that differ b/w the group "exposed"
to the "agent" of interest and that
could distort the comparison
(unless investigator is VERY FORTUNATE -- as was John Snow)
-cohort experiment:
People don't say "cohort EXPERIMENT"
cohort is a group that is followed up (a clin trial has at least 2 such groups, and
IS experimental...)
BUT there are many cohorts whose "exposure" is NOT allocated by the investigator
(as you correctly state above, to be an experiment, it is the investigator who allocates
the Rx)
ie most cohort studies are NON-EXPERIMENTAL and indeed some see cohorts as a subtype
of
NONEXPERIMENTAL studies If you read the excerpt from Rothman and Greenland, you will
see that they put cohort studies as a subtype of NON_EXPERIMENTAL studies.
the main thing that does defines the cohort is the "start with denominator..
attitude". There is nothing about the definition that says who put which persons
in the "exposed" subgroup, and which in the "unexposed" subgroup
-- it could be the subjects themselves (non-exptl) OR the investigator (experimental
eg clin or field or community trial)
Comparison between groups. Denominator selected first. It is non-experimental
meaning that the investigator does not allocate people to groups, but, the efficiency
can be increased by the investigator comparing appropriate groups (e.g. residence
vs. spouces of residence)
YES, but rather than efficiency I would say the distortions can be reduced
by selecting and comparing "otherwise similar" groups..
PS: we were talking of medical resideNTS and spouSes of residenTS
(see abstract on web page)
-case control:
Comparison between groups.
careful here.. CONCEPTUALLY we always compare rates in the "exposed" and
the "not exposed"
even if we have to use quasi-denomiinators to do so ( and limit ourselves to RATIOS
of rates)
Numerator selected first.
YES (ie start with the cases of the disease / outcome of interest)
then sample from the "base" that generated the cases in order to estimate
the relative sizes of the "exposed" denominator and the "unexposed
denominator"
[denominator series is usually called "control series" but as I and others
point out, the word "control" can be quite confusing here ]
it is much more descriptive to refer to the sample used to estimate the relative
sizes of the 2 denominators as the "denominator series"
It is nonexperiemental.
YES case-ctl studies are ALWAYS NON-EXPTL
if one STARTS with numerators (ie cases) then by definition it is already after the
fact, and (presumably) the first time that the investigator has "come on the
scene" .. so investigator could not even have been present earlier on when the
subjects choose their "exposure".
JH