BIOS601 AGENDA: Thu Tue Tue Thu 01 06 08 13, 2015

[updated Sept 28, 2015]

Agenda for October Thu Tue Tue Thu 01 06 08 13, 2015

Discussion of issues in JH's Notes and assignment on Likelihood estimation

Answers to be handed in for: (Supplementary) Exercises 3.1, 3.2, 3.3, 3.4, 3.5, 3.7, 3.8, 3.9

Remarks on Notes:

These notes were developed to supplement the Clayton and Hills chapter, which was aimed at epidemiologists, and which does not give the derivations (the 'wiring' and 'theory') below the results (the user's view of the car).

Even if you have, in your previous mathematical-statistics courses, covered Likelihood, Maximum Likelihood (ML) and Maximum Likelihood estimators (MLEs), it is unlikely that they were presented to you in the same way as JH introduces them here through this series of exercises. These exercises are designed to reinforce the point that a likelihood is (proportional to) the probability, itself a function of theta, of observing the data or outcomes we did observe, and that the component probabilities in the product (we sum their logs to obtain an overall log-likelihood) are not always of the simple form pdf(y). Many of JH's examples involved 'binned' data, where the corresponding probabilities can not be accurately approximated by rectangles, ie by pdfs at the data-values (the y's) multiplied by a delta-y. Instead, they are calculated as integrals. And in some cases, as when we have right- or left-censored data, the integral can be over an open-ended interval. A big attraction of the Likelihood approach is its ability to obtain information on a parameter by combining information of different types.

The exercises are also designed to reinforce the point that not all MLEs have a closed form.

JH will (and you should) try to distinguish between estimator (a procedure, formula, etc) and estimate, the numerical result of applied this estimator to data. Of course, the abbreviation MLE stands for both the estimator and the estimate.

Remarks on assigned exercises .

3.1 Daniel Bernoulli's example.

See Notes. Apart from the intent to bring in a bit of history, this is also and example where someone has worked out the likelihood and maximized it by brute force, but you can simply use R. From what JH gathers from others, optimize is recommended for 1-D problems, and optim for higher dimensional problems.

3.2 Fisher's example of measurement errors.

The measurement errors are grouped into classes (bins). Usually we use bins of our own choosing, for convenience, e.g., age bins 15-20, 20-25, etc, or annual income brackets $20K-$40K, 40K-60K, etc... sometimes open ended.

What's different here is that, because Fisher is dealing with a 'symmetric about 0 Normal distribution', he puts the errors between -1 and 0 in the same bin as those between 0 and 1; likewise he puts the errors between -2 and -1 in the same bin as those between 1 and 2. He is cutting down on his numerical work, using absolute errors, since the expected proportion in a 'left of 0' 1-unit wide bin should be the same as the proportion in its 'right of 0' counterpart.

In this example, the frequency data are multinomial, so there is one joint probability, involving the usual product of probabilities, each one an integral. Its log will be a sum, but with some dependence. But, in other cases (e.g., the dilution experiment, or meta-analysis), we merge likelihoods from independent datasets or observations.

Usually, when we learn likelihood, each bin is very narrow (eg someone's height or weight is to the nearest cm or Kg) so the prob. mass in the bin is closely approximated by the pdf(mid-bin) x the width of the bin = pdf(y) x dy. But this is not the case here; while you might get a reasonable approximation to some bin probabilities using the area in a rectangle, it won't work very well for others, and it won't work at all for the (effectively) open-ended bins.

I expect you will realize that there is no closed-form expression for the MLE of sigma-squared or sigma, so you can instead plot the Log-likelihood and visually find the MLE. You might find the log-likelihood involving log(sigma-squared) better behaved than the log-likelihood involving sigma-squared itself. And, of course, you can search in the sigma or sigma-squared scale.

Later on, we will use more sophisticated numerical ways to find the MLE, and at the same time to calculate the curvature, and thus, the precision of the estimate. This iterative method is especially valuable when the parameter is multi-dimensional.

For 3.3 (Galton's data on the speed of piegons), most bins are 100yards/min wide, but the '-500' bin extends all the way from 0 to 500 (technically, since we are dealing with a N(,) distribution, from -Inf to 500), but all of these racing pigeons, even the older ones, have speeds well above 0.

Even though they do not involve time (the subject of survival analyses), the data in exercises 3.2 and 3.3 are in fact 'interval-censored'.

In 3.4 (déjà vu) the 2 datapoints are independent, so the log-lik is a sum of two independent log(probability)'s. In this case, you already obtained a closed-from estimator. One of the points of this question was to emphasize that for ML estimation, one needs a fully specified distributional form for each observation. In contrast, the LS estimator did not require a distributional form for the errors about the 'line of means'; indeed, the LS estimator is a purely numerical line-fitting approach, and doesn't rely on any statistical assumptions. It is only if we wish to describe the statistical behaviour of the LS estimator that we need to specify a complete model.

The observations in 3.5 and 3.6 are examples of censored 'survival' data. One of the points of JH's choice of the 'tumbler' longevity data is that is shows that there are (at least) two equivalent ways of deriving a likelihood.

* One can regard the frequencies as the realization of a single multinomial r.v., just as in examples 3.1 and 3.2. In this approach, all of the 'bin' probabilities must add to 1, so we can think of it as an 'unconditional' approach.

* By focusing on week-specific 'failure' rates or 'hazard' rates -- one can proceed week by week, and treat each week as a separate (conditional) binomial. In this approach, each conditional probability and its complement add to 1 within each week, but across weeks, the week-specific failure probabilities are not constrained to add to 1.

The data in 3.6 have a very special structure: every observation is either right-censored (if the person is pulled out alive) or left-censored (if pulled out dead). The same kind of 'current-status' data arise if we conduct a maternal-and-child survey, and ask (i) the infant's age and (ii) whether the infant is still being breast-fed, and wish to convert the responses into a "still-breastfeeding' curve that shows, for each week or month of age, what proportion of infants are still being breast-fed at that time point.

In the ML approach, each likelihood contribution is the integral of the pdf from either 0 to 't' or from 't' to Inf. As we will discuss further in class, it is not easy to come up with a single sensible pdf for the avalanche data, since there are at least 3 separate causes of death, such as trauma, asphyxia, and hypopthermia, and the dataset does not distinguish them. But, to keep it simple, for the purposes of this chapter, we will adopt some a single 1- or 2-parameter distribution.

In 3.7 note JH's suggestion to re-parametrize the proportions so that they always stay within their bounds.

3.8 Dilution series is a very good example where we do not have any obvious and easy estimator, the way we often do for LS. This was one of Fisher's very first (and ver compelling) uses of ML estimation.