Course 678: Analysis of Multivariable Data. June 1999 Homework to be handed in by Friday June 4 1. Use software (or brainware) to do KKM Ch 6, Questions 2 and 3 page 101 2 (From Neter et al, page 88) For each of the following questions, explain whether a confidence interval for a mean response or a prediction interval for a new observation is appropriate. a. What will be the humidity level in this greenhouse tomorrow when we set the temperature level at 31°C? b. How much do families whose disposable income is $23,500 spend, on the average, for meals away from home? c. How many kilowatt-hours of electricity will be consumed next month by commercial and industrial users in Montreal, given that the index of business activity for the area remains at its present level? 3. Blood Alcohol and Eye Movements (see under datasets on web page) Questions are at end of documentation file. 4. Analysis of Rates of Fatal Crashes on rural interstate highways in New Mexico in the 5 years 1982-1986 (55 mph limit) and in 1987 (65 mph limit). Data from JAMA by Gallaher et al. Oct. 27, 1989;262:2243-2245. DATA: ---------- 55 mph -----------||-- 65 mph -- YEAR 1982 1983 1984 1985 1986 || 1987 Rates per100 million vehicle miles* 2.8 2.0 2.1 1.7 1.9 || 2.9 *vehicle miles; rate named "R_ALL" below. SUMMARIES IF65MPH = 0 IF65MPH = 1 N OF CASES 5 1 MEAN 2.100 2.900 VARIANCE 0.175 0.000 The aim is to take compare 1987 with the most relevant period; the average of 1982-1986 is probably too high (rates seem to have been falling over that time). Also one should take out the systematic variation in the 5 years that, in the variance used in the t-test or 1-way anova, appears as "unexplained noise". In other words, the idea is to make the comparison both FAIRER and SHARPER. The authors fitted a regression line to the 5 years, calculated the "expected" value for 1987 and the expected range of variation around this fitted mean, and determine where, relative to this predicted range of possible values, the observed value in 1987 lies. Output from SYSTAT DEP VAR: R_ALL N:5 MULTIPLE R:0.794 MULTIPLE R2: 0.630 Root MSE: 0.294 VARIABLE COEFF. STD ERROR T P(2 TAIL) CONSTANT 418.740 184.345 2.272 0.108 YEAR -0.210 0.093 -2.260 0.109 SOURCE SUM-OF-SQUARES DF MEAN-SQUARE F-RATIO P REGRESSION 0.441 1 0.441 5.108 0.109 RESIDUAL 0.259 3 0.086 Your task: Use the above output, and formula 5.15 p 59, (or if you prefer, use your own software @) to fill in the blanks... (using * for multiplication) "fitted" rate for 1987 = ____ +/ ____ * ____ = 1.47 (slightly different from authors' because of rounding) "Range of variation (95%) of new value about projected value" of 1.47 : 1.47 +/- t___,95 * ____ * sqrt[ 1 + 1/___ [1987 - 1984]_squared + -------------------------- Sum{[year - 1984]_squared} ] = 0.14 to 2.80. The observed point of 2.9 is just outside the 95% range of random variation about the mean predicted for 1987. In fact, using the SD of 1.45 [the 0.4205 obtained by multiplying the 0.294 by the radical, the 2.9 is t = (2.9 - 1.47)/0.4205 = 3.40 SD's above expected, and since the estimated SD is based on only 3 df, this deviate is somewhere between the 97.5% and the 99%ile. It is not clear whether the p-value in the article is 1- or 2-sided, or indeed whether the authors calculated it in the same way as here. @ For example, in SAS INSIGHT, you can type in the 6 datapoints. Create a third column consisting of 5 1's and 1 zero and designate it as a "weight" variable (by using the popup menu you get by clicking on the left cell above the column header). This means that the 1987 data get no weight in (are exclude from) the fitting. After you fit the line use the prediction curves to see where the 1987 datapoint falls relative to the predictions. OPTIONAL ======== 5 Difference in Bone Density over 2 centuries (Word file, and 2 figures (.gif) under Chapter 5) a How do you think they calculated "The precision of measurement" of 1.2% (femoral neck) [3rd paragraph of Subjects and Methods]? (We haven't covered this concept in the course, but make your best educated guess) b Put the "slope=0.197" in Table III into plain words. c "in the ancient femora, there was no significant loss of bone density premenopausally in either region" [1st half 2nd sentence Results] (i) Draw in the regression line for ancient femora, femoral neck, premenopausal. (ii) Is there enought information in the article to directly (i.e. without going back to raw data points!) verify "no (statistically) significant loss"? Why/why not? d "in striking contrast to modern women" refers to the slope of 0.197 vs. that of ­0.658 (neck) and ­0.162 vs. ­0.921 (triangle). If you had the regression printouts for the ancient and modern-day groups analyzed separately, how would you use them to verify that this "striking difference" was indeed statistically significant? (Hint: SE(difference of two independent estimates) = sqrt[ square of SE of one estimate plus square of SE of one estimate ] e The answer from the test involving r=0.424 was "*p<0.005"; upon what null hypothesis is the p value calculated?