BIOS601 AGENDA: Tuesday October 02 and Thursday October 04, 2012
[updated Oct 01, 2012]
Agenda for October 02 & 04, 2012
- Discussion of issues
in
C&Hs Chapter 04 (Consecutive Follow-up Intervals), and
JH's
Notes and Assignment on this chapter
Answers to be handed in for:
(Supplementary) Exercises 4.1, 4.2, 4.3, 4.4, 4.5, 4.6, 4.7
Remarks on Notes:
These notes were developed to supplement the Clayton and Hills chapter,
which was aimed at epidemiologists, and which does not give the
derivations (the 'wiring' and 'theory') below the results (the user's view of the car).
It is important to read C&H first, before JH's notes.
The core topics in this chapter are non-parametric
(or more precisely,
distribution-free) approaches to estimating survival curves, and the
associated functions (e.g., pdf and hazard function) that can be derived from them.
Last week, in the orientation to ML estimation, several of our examples
involved specific candidate distributions for rv's that take on
values on the (0,Inf) scale, such as exponential, gamma, log-normal etc.
But, probably to your surprise, you will in one of the exercises learn that
whereas the Kaplan-Meier estimator is usually described as a non-parametric estimator,
it can also be shown to be the survival curve, among
ALL POSSIBLE SURVIVAL CURVES
survival curves, that makes the observed data most likely; and so it
is sometimes referred to as a non-parametric MLE -- almost a contradiction
in terms, especially when we emphasize that for ML one needs to specify
a distribution with a full (parametric) form.
C&H develop the K-M estimator very 'naturally' by slicing time
finer and finer, so that most conditional survival probabilities
in the product are unity, and can be omitted, leaving just the
(less than unity) conditional survival probabilities for the time-bands that contain
>= 1 event.
One could begin even further back, and consider what the empirical cdf(t)
and thus its complement, the empirical S(t), would look like if there were no
censoring. In this case, when we got to the t where a cumulative total of
k subjects have made the transition from the initial state,
the empirical S(t) would be
[(n-1)/(n)] x [(
n-2)/(n-1)]
x [(n-3)/(n-2)] x ...
[(n-{k-1})/(n-{k-2})) x
[(n-k)/(n-{k-1})
and this simplifies, because k-1 terms on the top would cancel the same
k-1 terms in the bottom, to (n-k)/n.
In the K-M version, its the same structure, BUT (because of censoring)
not all of the 'survivors' of one time-band experience the next time-band.
The no. at risk (the risk set) gets progressively smaller, not
just because of the transitions, but because of the 'staggered' entries
or the lost-to-follow-up.
-----
Since the C&H book was written, the Nelson-Aalen estimator
has become
more popular, and it is now found in all good survival analysis packages.
So it deserves some study, and to be properly understood.
As JH notes at the end of part II of his expository article, there is some
confusion as to what a N-A curve means,
since it is often taken to mean
integral of the estimated hazard function
(JH thinks this is the more common
meaning). But it is sometimes used to refer to the survival curve
= exp[-integral of the estimated hazard function] that
one can derive from the integrated hazard function. If we want to think
of K-M and N-A curves 'in parallel', then it is this latter
downwards travelling, N-A step-function,
taking on values from 1 to 0,
version makes the N-A step-function and the K-M step-function very close cousins.
Remarks on assigned exercises .
The exercises are also designed to i. get you familiar with
the Greenwood formula, and with how to obtain K-M and N-A
'curves' via R, ii. appreciate why and by how much they differ,
and when, and iii. see some live examples of survival-analysis
and infection-rate-analysis, and see how sometimes the fact that
interval-censored
observations (such as those from HIV testing) are simplified
in actual analyses, especially if, as in the Kenya and Uganda examples,
simplifying the data doesn't change the estimates very much.
4.1 . As we remarked above, this aspect of the K-M estimator
is unusual. But why not think of it this way: imagine you can choose ANY
distribution you wish, (as long as it's a legitimate cdf) and that its cdf is
simply called a 'no-name-cd'f (it could have vertical jumps, and not be
a smooth functio such as we have entertained so far)
Then in this example, what would the Likelihood be?
Wouldn't it be (no matter what cdf or S(t) we choose,
prob[1st observation | this cdf or this S(t) function]
x
prob[2nd observation | this cdf or this S(t) function]
x
prob[3rd observation | this cdf or this S(t) function].
Since the 2nd observation is that the transition (event) will occur
at some time point after t=7, i.e., it is a right-censored at 7,
prob[2nd observation | this cdf or this S(t) function] is S(7)
or 1-CDF(7), So you read off this from the candidate S(t) function
you are 'trying on' for size.
For the likelihood contribution from the 1st observation,
we note that this is an uncensored observation,
or if you like, 'interval-censored' within a narrow interval
that contains the value 5. We need the probability of observing this.
Shouldn't we, by analogy with when we are constructing an empirical cdf
for n uncensored values, put a probability 'spike' or 'point mass'
at t=5? The question is how much to put? If all n observations were uncensored,
we would put a mass of 1/n at each value.
Likewise, we would need to put some probability mass at t=10.
The question is where else (if anywhere) should we put some mass?
how about 1/3 at t=5, 1/3 at t=10, and the other 1/3 spread out
uniformly over the interval t=7 to t=9 say. If we did this,
the S(t) curve would equal 1 until t=5,
take a vertical dive at t=5, and then head horizontally (at a height of 2/3)
until t=7, then head downwards from t=7, until it reaches
S(9)=1/3, then head straight across to S(10) =1/3, then down to S(10+)=0.
We can now calculate the L under this 'candidate' S(t) function:
1/3
x
2/3
x
1/3.
= 2/27
How about 1/3 mass at t=5, 1/2 mass at t=10, and the other 1/6 mass spread out
uniformly over the interval t=7 to t=9 say. If we did this,
the S(t) curve would equal 1 until t=5,
take a vertical dive at t=5, and then head horizontally (at a height of 2/3)
until t=7, then head downwards until it reaches
S(9)=1/2, then straight across to S(10)=1/2,
then head straight down to S(10+)=0.
The L under this 'candidate' S(t) function is:
1/3
x
2/3
x
1/2
= 2/18, better than before.
If you keep reducing the mass between 7 and 9, and instead placing it
at t=10, until t=you get to the S(t) function described in the question,
you get the L under this 'candidate' S(t) function as:
1/3
x
2/3
x
2/3
= 4/27, better than any others.
This suggests that to maximize L, we should only put probability mass
at the times of the events (the so-called 'failure' times), and NONE
at the CENSORED times.
The question then is how much at each 'failure' time.
4.2 à venir
...