The following data items are from an investigation
into the natural history of (untreated) prostate cancer [ report (.pdf) by
Albertsen Hanley Gleason and Barry in JAMA in September 1998 ].
id, dates of birth and diagnosis, Gleason score,
date of last contact, status (1=dead, 0=alive), and -- if dead -- cause of death
(see 2b below). data
file (.txt) for a random 1/2 of the 767 patients
1. Compute the distribution of age at diagnosis (5-year intervals) and year of diagnosis
(5 year intervals). Also compute the mean and median ages at diagnosis.
2. For each of the 20 cells in Table 2 (5 Gleason
score categories x 4 age-at-dx categories), compute the
a. number of man-years (M-Y) of observation
b. number of deaths from prostate cancer(1), other
causes(2), unknown causes(3)
c. prostate cancer(1) death rate [ deaths per 100
d. proportion who survived at least 15 years.
For a and b you can use the 'sum'
option in PROC means;
ie PROC MEANS data = ... SUM; VAR vars you want to sum;
BY the 2 variables that form the cross-classification.
Also think of a count as a sum of 0s and 1s.
For c (to avoid having to compute 20 rates by hand), you can 'pipe' i.e. re-direct
the sums to a new sas datafile, where you can then divide one by other to get (20)
rates. Use OUTPUT OUT = .... SUM= ...names for two sums;
3. On a single graph, plot the 5 Kaplan-Meier survival
curves, one for each of the 5 Gleason score categories (PROC LIFETEST .. Online help
is under the SAS STAT module, or see http://www.ats.ucla.edu/stat/sas/seminars/sas_survival/default.htm.
For Stata, see http://www.ats.ucla.edu/stat/stata/seminars/stata_survival/default.htm.
4. [OPTIONAL] In order to compare the death rates
with those of U.S. men of the same age, for each combination of calendar year period
(1970-1974, 1975-1979, ..., 1994-1999) and 5 year age-interval (55-59, 60-64, ...
a. the number of man-years of follow-up and
b. the number of deaths.
Do so by creating, from the record for each man,
as many separate observations as the number of 5yr x 5yr "squares" that
the man traverses diagonally through the Lexis diagram [ use the OUTPUT statement
within the DATA step]. Then use PROC MEANS to aggregate the M-Y and deaths in each
square. If you get stuck, here is some SAS code that does this, or see the algorithm
given in Breslow and Day, Volume II, page ___