McGill University, Department of Epidemiology,
Biostatistics and Occupational Health
EPIB 613: Introduction to Statistical Software (Fall 2006 / Winter 2008)
Assignment 3, due April __, 2008
Working in teams of two...

1a

Reproduce (or if you think you can, improve
on) three of the graphs shown in "Examples of graphs from Medical Journals."
These examples are in a pdf file on the main page. Use Excel for at least one of them,
and R/Stata/SAS for at least one other. Do not go to extraordinary lengths
to make them exactly like those shown  the authors, or the journals themselves,
may have used more specialized graphics software.
You may wish to annotate them
by making (and sharing with us) notes on those steps/options that were not immediately obvious and that took
you some effort to figure out. Insert all three into a single electronic document.



1b

Browse some medical and epidemiologic journals and
some magazines and newspapers published in the last 12 months, Identify the statistical graph you
think is the worst, and the one you think is the best. Tell us how many graphs you
looked at, and why you chose the two you did. If you find a helpful online guide
or textbook on how to make good statistical graphs, please share the reference with
us. [The bios601 site
http://www.epi.mcgill.ca/hanley/bios601/DescriptiveStatistics/ has a link to the
Textbook by Cleveland and the book "R Graphics" by Paul Murrell.
If possible, electronically paste the graphs into the same electronic file
you are using for 1a. 


2

[OPTIONAL]
The main page has a link to a lifetable workbook containing
three sheets. Note that the 'lifetable' sheet in this workbook is used to calculate
an abridged current life table based on the 1960 U.S. data. Use this sheet
as a guideline, and create a current lifetable ('complete', i.e., with 1year
ageintervals) for Canadian males, using the male population sizes, and numbers
of deaths, by age, Canada 2001. [The calculations in columns O to W of the lifetable
sheet are not relevant for this exercise]. Details on the elements of, and the construction
of current lifetables can be found in the chapters (on website) from the textbooks
by Bradford Hill and Selvin, and in the technical notes provided by the US National
Center for Health Statistics in connection with US Lifetable 2000. See also the FAQ
for 613 from 2005. The fact that the template is for an abridged life table, with
mostly 5year intervals, whereas the task is to construct a full lifetable with 1
year intervals, caused some people problems last year.. they realized something was
wrong when the life expectancy values were way off!
Since this is an exercise, and not a calculation for an insurance company that wants
to have 4 sig. decimal places, don't overly fuss about what values of 'a' you use
for the early years.. they don't influence the calculations THAT much: If you try
different sets of values (such as 0.1 in first year and 0.5 thereafter) you will
not find a big impact. But don't take my word for it .. the beauty of a spreadsheet
is that you can quickly see the consequences of different assumptions or 'what ifs'.
[In practice, in order not to be unduly influenced by mortality rates in a single
calendar year (e.g. one that had a very bad influenza season), current lifetables
are usually based on several years of mortality data. Otherwise, or if they are based
on a small population, the quantities derived from them will exhibit considerable
random fluctuations from year to year ]
Once you have completed the table, use the charting facilities in Excel to plot the
survival curve for the hypothetical (fictitious) male 'cohort' represented by the
current lifetable.
On a separate graph, use two histograms to show the distributions of the ages at
death (i) for this hypothetical male 'cohort' and (ii) those males who died in 2001.
To make it easy to compare them, superimpose the histograms or put them 'side by
side' or 'back to back' within the same graph. Explain why the two differ in shape
and location. Calculate/derive (and include them somewhere on the spreadsheet) the
median and mean age at death in the hypothetical cohort and the corresponding statistics
for the actual deaths in 2001.



(updated March 26, 2008)
