BIOS601 AGENDA: Monday September 25 and Wednesday 27, 2017
[updated Sept 22, 2017]
Agenda for Monday Sept. 25 and Wednesday Sept. 27, 2017
- Discussion of issues
Notes and assignment on
intensity rates:- models
Answers to be handed in for:
Exercises 0.1, 0.2, 0.6, 0.7, 0.9, 0.10, 0.12, 0.13
+ (if in PhD program) 0.3, 0.5, 0.11
Remarks on Notes:
These notes are based on those he developed for
the courses EPIB626: Risks and Hazards
which JH taught in the epidemiology graduate program. As such,
they emphasize the 'end-product', rather than how the product was arrived at.
In bios601, we will emphasize both.
The Poisson distribution, one of JH's favourite distributions,
is at the core of this very epidemiologic topic.
It is of course widely used in Physics (eg in counting particles)
Rolling in the Higgs (Adele Parody) song
"a new data-peak at at 125GEv." (cf Figure on page 4)
Given it is so widely used in biostatistics and epidemiology, it is surprising
that some biostatistician/epidemiologist hasn't composed a song about it.
The recent science news from Iceland,
Older dads linked to rise in mental illness
Fathers bequeath more mutations as they age
Genome study may explain links between paternal age and conditions such as autism.
reported in the Nature article
Rate of de novo mutations and the importance of father's age to disease risk
is just the most recent and striking epidemiologic/biologic example of the
Poisson distribution. We will revisit it soon.
Notation: JH will use mu as the expected value, just like he uses mu for
the expected value of a Gaussian r.v.
Keep lambda for the rate per unit of experience. The product
of lambda and the amount of experience is mu.
Make sure to look at some of the Examples, and run some of the
Simulations, to be found under Resources.
1.1 (When it applies) is important. There
are many misunderstandings about this.
Section 2. Inference
We will leave this until we have a
common way of approaching point and interval estimation.
But, you will again see the CLT playing a central role.
And notice the use of the log(mu) scale
when making inferences,
and in Section 2.2.2, the delta method to get an approx. variance for
the log of a count -- assuming the count isn't zero
[useful in exercise 0.1!]
Section 2.1.3 and Fig 4 show again the role of the CLT -- if one accumulates a
big enough count
(by accumulating enough observation time,
or 'volume of experience'),
and the contributions are independent, the CLT kicks in. And the same
'rules of thumb' about being at least 5 or 10 in the the (this time) lower edge
are reminiscent of those for the Normal Approximation to the binomial.
Here, we work directly with the expected counts, rather than with
n and pi separately.
3 Applications, and Notes
The "How many must I count? [section 3.1] shows an important
point about rates (or 'concentrations') made from counts:
not the amount of experience that creates statistical (in)stability;
it's the size of the numerator, i.e., the size of the count.
Look at the widths of the CIs in the needle-stick injuries
study in Table 1 in section 3.3.
The "divisibility of the experience
that underlies a Poisson count is important: the same
does not apply to the binomial, where typically
the denominator is the number, i.e., the amount, of persons.
In Poisson counts, the equivalent is the amount of
Just as in the story of Solomon, who settled the 'child-ownership'
dispute between the two women who were claimed to be the mother,
persons are not divisible; but the amount of time
we observe them is. But, no matter whether
its person or person time, the numerator (the count)
is not infinitely divisible.
Section 3.6 (CI for an incidence density or rate):
Epidemiologists, and (applied)
biostatisticans, are students of rates, not of the Poisson-distributed
numerators that serve as inputs to the theoretical and empirical rates.
You can think of an empirical rate as a transformed (or scaled)
realization of a Poisson r.v. The scaling is so simple
that the "delta" method is obvious and immediate.
And think of an incidence density as the epidemiologist's
term for an rate or intensity.
Remarks on assigned exercises .
0.1 (m-s) Working with logs of counts and logs of rates
The log(rate) is absolutely central to epidemiologic data analysis,
and so you need to be quite comfortable working in this scale,
and then going back to the rate scale.
0.2 (m-s) The Poisson Family as a 'Closed under Addition' Family
This is a very importan (but often overlooked) property.
It is what allows epidemiologists to add the expected numbers of new cancers
diagnosed in different age groups and compare this with the total number of cancers
observed in these age groups.
The age-structure of the source population is determined
by many factors, and the cancer incidence rates are usually
a strong function of age. Thus, their products (the mu's),
and therefor the observed counts, in different age groups
are likely to be very different from each other. But their sum
is still a Poisson r.v. You will notice that in the study
of childhood leukemia near nuclear plants on Ontario (section 3.2),
the authors aggregated the numbers of cancers in the different age bins,
as the age-specific numbers would be tiny and uninterpretable.
0.3 (m-s) Link between Poisson and
Many Web articles and textbooks cover this topic. Give a
referece for/link to your favourite clearly described derivation of your choice.
There is no need to repeat all the
algebra; instead, briefly describe the derivation in your own WORDS.
4 (m-s) Link between tail areas of Poisson and Chi-sq Distributions
0.5 (m-s) Fisher Information
The same random variable provides a different amount of information about one
parameter than another. This is merely because we are in a different scale.
And you can get there from first principles, or by the Delta Method.
0.6 (m-s) the sixth decimal place
New this year. Stigler's father, mentioned in The American Statistician article,
won the Nobel Prize in Economics.
0.11 Enough Coins?
The purpose here is to introduce the idea of a mixture of latent (or unrecognized) classes or subgroups.
Here we had seven classes, each with its own mu. And the spread and average of the 7 probabilities
is not necessarily the probability distribution at the average of the 7 mu's. It is easy to imagine a
generalization to a larger set of mu's, affected (in traffic accident epidemiology) by many variables
such as weather, unusual local and bigger patterns and circumstances we are not aware of, etc.
And just like we have 'extra-binomial' variation, we often have 'extra-Poisson' variation. Sometimes
(as in the births example) we know what causes it (and how to remove this 'noise'); sometimes we do not.
0.12 and 13
Check back later ... JH plans to add some remarks