Course 678: Analysis of Multivariable Data.  June 1999

         Homework to be handed in by Wednesday June 16
         
*****************************************************************

Question 1

Refer to the example on pages 364-5 of KKMN, showing unadjusted 
and adjusted mean scores in two groups (Z=0,1) that differed on a 
factor X .

Carry out the same adjusted mean calculations using the fruitfly 
longevity data : first test formally that the assumption of 
parallelism is borne out by the data;  then follow KKMN and 
"adjust" the mean longevity in each group to the mean thorax of 
the 50 flies in the 2 combined samples;

Verify that the numerical difference in the 2 adjusted means is 
the same as that produced by the formula given in the outline for 
session 6, and that it doesn't matter to what (common) thorax size 
one adjusts the two means. Also, try to show this in general (i.e. 
in symbols) by subtracting the 2 adjusted means given in formula 2 
in equation 15.5 on page 363.

Compare the adjusted difference obtained above with the fitted 
coefficient for TYPE in the regression model with the two terms 
TYPE and THORAX. In light of your answer, interpret to your in-
laws the coefficient for TYPE in this model.

Question 2
========

Refer to KKMN Ch 15,  Question 6 (p 376-377).

With W on the vertical axis and X on the horizontal axis, draw the 
4 parallel lines implied by the fitted equation with 5 terms.

Calculate the mean X in each of groups A and C, and calculate the 
adjusted W means, adjusted to the mean X of the two groups. Would 
their difference  be different if you adjusted them to the mean X 
of all 4 groups? Why/Why not?

Verify that their difference is the same as that which you get by 
subtracting the beta_hats for the 2 corresponding Z's. Why is 
this? 

The standard error of the difference in the adjusted means is the 
square root of 
{ square of 0.0877 + square of 0.0954 - 2 Covar[-0.135,-0.391]}, 
where  Covar[-0.135,-0.391] refers to the covariance of the two 
beta_hats in question. The reason for the non-zero covariance term 
is that the two beta_hats are derived using group D as a common 
(shared) reference group and also that they share a common 
adjustment factor of 0.1728 which contains estimation error. One 
can get this covariance by asking for the "Estimated Cov Matrix" 
when/after fitting the model (see footnote below if interested). 
Instead of doing this, think of a more direct way ... by re-
defining the 3 dummy ("indicator") variables (say B, C, and D) so 
that one of them directly estimates the A-C difference of 
interest, and so the standard error of this (adjusted) difference 
is shown directly in the "parameter estimates" output.

Question 3
==========

KKMN Ch 12,  Question 27 (p 269-377). You should not have to go 
back to the raw data.

Question 4 [if you have time]
==========

Recall the Lidkoping injury prevention study.  Re-do the analysis 
using both girls and boys (36 observations in all) in the same 
dataset.

How come the SE of the estimate of interest ("difference of 
slopes")
becomes bigger,  rather than smaller, when using more data? It 
might help to identify the genders a plot of the data.

Attempt to overcome this by adding something to the model to 
remove the extra noise you have introduced by merging the two 
genders. Make a (computer or schematic) graph of your refined 
model.


----
re question 2 above
                      Estimated Cov Matrix
            INTERCEPT          X         Z1         Z2       Z3
 INTERCEPT     0.0923    -0.0004    -0.0165     0.0054  -0.0102
 X            -0.0004     0.0000     0.0001    -0.0000   0.0000
 Z1           -0.0165     0.0001     0.0091     0.0023   0.0046
 Z2            0.0054    -0.0000     0.0023     0.0081   0.0029
 Z3           -0.0102     0.0000     0.0046     0.0029   0.0077

0.0091 = var(Beta_hat for Z1) = square of SE i.e. of 0.0954
0.0077 = var(Beta_hat for Z3) = square of SE i.e. of 0.0877

0.0046 =covar[Beta_hat for Z1, Beta_hat for Z2]

so SE[Beta_hat for Z3 - Beta_hat for Z1]
    = sqrt[ 0.0077 + 0.0091 - 2(0.0046)] = sqrt[0.0076] = 0.087

This formula is a general one for the SE of the difference of two 
correlated estimates.