The following data items are from an investigation
into the natural history of (untreated) prostate cancer [ report (.pdf) by
Albertsen Hanley Gleason and Barry in JAMA in September 1998 ].
id, dates of birth and diagnosis, Gleason score,
date of last contact, status (1=dead, 0=alive), and  if dead  cause of death
(see 2b below). data
file (.txt) for a random 1/2 of the 767 patients
1. Compute the distribution of age at diagnosis (5year intervals) and year of diagnosis
(5 year intervals). Also compute the mean and median ages at diagnosis.
2. For each of the 20 cells in Table 2 (5 Gleason
score categories x 4 ageatdx categories), compute the
a. number of manyears (MY) of observation
b. number of deaths from prostate cancer(1), other
causes(2), unknown causes(3)
c. prostate cancer(1) death rate [ deaths per 100
MY ]
d. proportion who survived at least 15 years.
For a and b you can use the 'sum'
option in PROC means;
ie PROC MEANS data = ... SUM; VAR vars you want to sum;
BY the 2 variables that form the crossclassification.
Also think of a count as a sum of 0s and 1s.
For c (to avoid having to compute 20 rates by hand), you can 'pipe' i.e. redirect
the sums to a new sas datafile, where you can then divide one by other to get (20)
rates. Use OUTPUT OUT = .... SUM= ...names for two sums;
3. On a single graph, plot the 5 KaplanMeier survival
curves, one for each of the 5 Gleason score categories (PROC LIFETEST .. Online help
is under the SAS STAT module, or see http://www.ats.ucla.edu/stat/sas/seminars/sas_survival/default.htm.
For Stata, see http://www.ats.ucla.edu/stat/stata/seminars/stata_survival/default.htm.
4. [OPTIONAL] In order to compare the death rates
with those of U.S. men of the same age, for each combination of calendar year period
(19701974, 19751979, ..., 19941999) and 5 year ageinterval (5559, 6064, ...
obtain
a. the number of manyears of followup and
b. the number of deaths.
Do so by creating, from the record for each man,
as many separate observations as the number of 5yr x 5yr "squares" that
the man traverses diagonally through the Lexis diagram [ use the OUTPUT statement
within the DATA step]. Then use PROC MEANS to aggregate the MY and deaths in each
square. If you get stuck, here is some SAS code that does this, or see the algorithm
given in Breslow and Day, Volume II, page ___
