Course 678: Analysis of Multivariable Data. June 1999 Homework to be handed in by Wednesday June 16 ***************************************************************** Question 1 Refer to the example on pages 364-5 of KKMN, showing unadjusted and adjusted mean scores in two groups (Z=0,1) that differed on a factor X . Carry out the same adjusted mean calculations using the fruitfly longevity data : first test formally that the assumption of parallelism is borne out by the data; then follow KKMN and "adjust" the mean longevity in each group to the mean thorax of the 50 flies in the 2 combined samples; Verify that the numerical difference in the 2 adjusted means is the same as that produced by the formula given in the outline for session 6, and that it doesn't matter to what (common) thorax size one adjusts the two means. Also, try to show this in general (i.e. in symbols) by subtracting the 2 adjusted means given in formula 2 in equation 15.5 on page 363. Compare the adjusted difference obtained above with the fitted coefficient for TYPE in the regression model with the two terms TYPE and THORAX. In light of your answer, interpret to your in- laws the coefficient for TYPE in this model. Question 2 ======== Refer to KKMN Ch 15, Question 6 (p 376-377). With W on the vertical axis and X on the horizontal axis, draw the 4 parallel lines implied by the fitted equation with 5 terms. Calculate the mean X in each of groups A and C, and calculate the adjusted W means, adjusted to the mean X of the two groups. Would their difference be different if you adjusted them to the mean X of all 4 groups? Why/Why not? Verify that their difference is the same as that which you get by subtracting the beta_hats for the 2 corresponding Z's. Why is this? The standard error of the difference in the adjusted means is the square root of { square of 0.0877 + square of 0.0954 - 2 Covar[-0.135,-0.391]}, where Covar[-0.135,-0.391] refers to the covariance of the two beta_hats in question. The reason for the non-zero covariance term is that the two beta_hats are derived using group D as a common (shared) reference group and also that they share a common adjustment factor of 0.1728 which contains estimation error. One can get this covariance by asking for the "Estimated Cov Matrix" when/after fitting the model (see footnote below if interested). Instead of doing this, think of a more direct way ... by re- defining the 3 dummy ("indicator") variables (say B, C, and D) so that one of them directly estimates the A-C difference of interest, and so the standard error of this (adjusted) difference is shown directly in the "parameter estimates" output. Question 3 ========== KKMN Ch 12, Question 27 (p 269-377). You should not have to go back to the raw data. Question 4 [if you have time] ========== Recall the Lidkoping injury prevention study. Re-do the analysis using both girls and boys (36 observations in all) in the same dataset. How come the SE of the estimate of interest ("difference of slopes") becomes bigger, rather than smaller, when using more data? It might help to identify the genders a plot of the data. Attempt to overcome this by adding something to the model to remove the extra noise you have introduced by merging the two genders. Make a (computer or schematic) graph of your refined model. ---- re question 2 above Estimated Cov Matrix INTERCEPT X Z1 Z2 Z3 INTERCEPT 0.0923 -0.0004 -0.0165 0.0054 -0.0102 X -0.0004 0.0000 0.0001 -0.0000 0.0000 Z1 -0.0165 0.0001 0.0091 0.0023 0.0046 Z2 0.0054 -0.0000 0.0023 0.0081 0.0029 Z3 -0.0102 0.0000 0.0046 0.0029 0.0077 0.0091 = var(Beta_hat for Z1) = square of SE i.e. of 0.0954 0.0077 = var(Beta_hat for Z3) = square of SE i.e. of 0.0877 0.0046 =covar[Beta_hat for Z1, Beta_hat for Z2] so SE[Beta_hat for Z3 - Beta_hat for Z1] = sqrt[ 0.0077 + 0.0091 - 2(0.0046)] = sqrt[0.0076] = 0.087 This formula is a general one for the SE of the difference of two correlated estimates.