Course 613.. Assignments

McGill University, Department of Epidemiology, Biostatistics and Occupational Health

EPIB 613: Introduction to Statistical Software (Fall 2006)

Assignment 4, due November 6

Working in teams of two or three...


The following data items are from an investigation into the natural history of (untreated) prostate cancer [ report (.pdf) by Albertsen Hanley Gleason and Barry in JAMA in September 1998 ].

id, dates of birth and diagnosis, Gleason score, date of last contact, status (1=dead, 0=alive), and -- if dead -- cause of death (see 2b below). data file (.txt) for a random 1/2 of the 767 patients

1. Compute the distribution of age at diagnosis (5-year intervals) and year of diagnosis (5 year intervals). Also compute the mean and median ages at diagnosis.

2. For each of the 20 cells in Table 2 (5 Gleason score categories x 4 age-at-dx categories), compute the

a. number of man-years (M-Y) of observation

b. number of deaths from prostate cancer(1), other causes(2), unknown causes(3)

c. prostate cancer(1) death rate [ deaths per 100 M-Y ]

d. proportion who survived at least 15 years.

For a and b you can use the 'sum' option in PROC means;
ie PROC MEANS data = ... SUM; VAR vars you want to sum;
BY the 2 variables that form the cross-classification.
Also think of a count as a sum of 0s and 1s.
For c (to avoid having to compute 20 rates by hand), you can 'pipe' i.e. re-direct the sums to a new sas datafile, where you can then divide one by other to get (20) rates. Use OUTPUT OUT = .... SUM= ...names for two sums;

3. On a single graph, plot the 5 Kaplan-Meier survival curves, one for each of the 5 Gleason score categories (PROC LIFETEST .. Online help is under the SAS STAT module, or see For Stata, see

4. [OPTIONAL] In order to compare the death rates with those of U.S. men of the same age, for each combination of calendar year period (1970-1974, 1975-1979, ..., 1994-1999) and 5 year age-interval (55-59, 60-64, ... obtain

a. the number of man-years of follow-up and

b. the number of deaths.

Do so by creating, from the record for each man, as many separate observations as the number of 5yr x 5yr "squares" that the man traverses diagonally through the Lexis diagram [ use the OUTPUT statement within the DATA step]. Then use PROC MEANS to aggregate the M-Y and deaths in each square. If you get stuck, here is some SAS code that does this, or see the algorithm given in Breslow and Day, Volume II, page ___


Put all of the program steps and output into a single .txt file. JH will use a mono-spaced font such as Courier to view it -- that way the alignment should be ok. Interleave DATA and PROC statements with output and conclusions, and use helpful titles (produced by SAS, but to your specifications) over top of each output. Get SAS to set up the output so that there are no more that 65 horizontal characters per line -- that way, lines won't wrap-around even when the font used to view your file is increased. Show relevant excerpts rather than entire listings of datafiles. Annotate liberally. Submit the text file electronically (i.e., by email) to JH by 9 am on Monday Nov 7.

(updated Oct 29, 2006)