Session 4: Outline KKMN Chapter 8 (Multiple Regression Analysis ... General ) ========================================================== Review - If interested in independent contributions of each of several variables... are there situations where one can assess them one at a time i.e. assess a particular X while ignoring the others ... assess a different X while ignoring the others ... ? or does one ALWAYS have to assess them simultaneously ? - some diagrams may help ("confounding in pictures and numbers") - Walk-through Chapter 8 (if popular demand!) Multiple regression Equation ---------------------------- - geometrically ... as a "plane" if X1 and X2 - as a sequence of simple linear regressions - (in case of 2 X's) as contour map Models as "approximations" -------------------------- - model misspecification as a source of "errors" (e.g. estimating the area of a rectangle.. cf notes ch 8) Models for interpolation / smoothing ------------------------------------ - "borrowing strength" (e.g. outcome of prostate cancer) * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * KKMN Chapter 9 (Testing Hypotheses in Multiple Regression) ========================================================== - Idea of Larger ("Full") and Smaller ("Reduced") model - note my use of # of terms rather than # of variables [different if adding higher powers of a continuous variable, or if a variable is categorical and represented by indicator terms] - 3 Situations (Sections 9.2, 9.3, 9.4) [ 1st 2 are special cases of the 3rd (general) case ] - Only one situation requiring hand calculation - same ideas apply in logistic/Poisson regression - concept of variable added "IN ORDER" vs "LAST" - Neter's diagram for Extra Sums of Squares Number of terms (not counting intercept): M O D E L text Reduced Full Test Statistic df ---- ------- ------ -------------- ---------------- 9.2 0 k F(model) k, (n-1 - k) ("Overall F-test") 9.3 p p + 1 t or F(partial) 1, (n-1 - p - 1) 9.4 p p + k F(partial) k, (n-1 - p - k) statistic obtained from ... --------- --------------------------------------------- F(model) 1st line of summary Anova table ("Overall F-test") t t = beta_(p+1)_hat / its SE Diff. in Regression SS / k ( = diff. in # terms) F(partial) = ------------------------------------------------ Mean Square Error("Residual") in Larger Model = square of t if only k = 1 additional term Examples Berkeley data ... prediction of height at 18 HT18 (cm) from weight (kg) at age 2 height (cm) at age 2 gender (1=girl, 0=boy) HT18 = 61.84 + 0.04 WT2 + 1.33 HT2 - 12.00 GENDER Mean of Response 172.72 R-Square 0.75 Adj R-Sq 0.74 Root MSE 4.68 Analysis of Variance ==================== ("Overall-F" test) ^ SINGULAR Source DF Sum of Squares Mean Square F Stat Prob > F Model 3 3628.12 1209.37 55.33 0.0001 ^^^ ^^^^^ ^^^^^^ test of All 3 Betas = 0 versus AT LEAST ONE Beta is NOT 0 Error 54 1180.25 21.86 ------------------------- C Total 57 4808.38 Type III Tests ("Partial-F" tests) ^ PLURAL Source DF Sum of Squares Mean Square F Stat Prob > F WT2 1 0.14 0.14 0.01 0.9363 ^^^^^ test of Beta( WT2 | HT2 GENDER ) = 0 HT2 1 726.07 726.07 33.22 0.0001 ^^^^^ test of Beta( HT2 | WT2 GENDER ) = 0 GENDER 1 1952.56 1952.56 89.34 0.0001 ^^^^^ test of Beta( GENDER | WT2 HT2 ) = 0 Think of each t-test as a test of the contribution of the TERM in question, GIVEN THAT THE OTHER TERMS ARE ALREADY INCLUDED i.e. TEST OF its contribution as the LAST TERM in the model Parameter Estimates Variable DF Estimate Std Error T Stat Prob >|T| INTERCEPT 1 61.84 17.47 3.54 0.0008 WT2 1 0.04 0.50 0.08 0.9363 ^^^^^^ ^^^^^^ F = 0.01 = square of 0.08 HT2 1 1.33 0.23 5.76 0.0001 ^^^^^^ ^^^^^^ F = 33.22 = square of 5.76 GENDER 1 -12.00 1.27 -9.45 0.0001 ^^^^^^ ^^^^^^ F = 89.34 = square of -9.45 ******************************************************* * Each t-test is a "TERM ENTERED LAST" test * * * * The order in which you enter or "click" terms into * * the model doesn't matter ... * * * ******************************************************* Type I Tests (Again, "Partial-F" tests, but now, ORDER MATTERS!! ) ^^^^^^^^^^^^^ Source DF Sum of Squares Mean Square F Stat Prob > F WT2 1 935.54 935.54 42.80 0.0001 ^^^^^ test of Beta( WT2 ) = 0 HT2 1 740.02 740.02 33.86 0.0001 ^^^^^ test of Beta( HT2 | WT2 ) = 0 GENDER 1 1952.56 1952.56 89.34 0.0001 ^^^^^ test of Beta( GENDER | WT2 HT2 ) = 0 Think of each F-test as a test of the contribution of the TERM, GIVEN THAT THE TERMS BEFORE IT IN THE LIST ARE ALREADY INCLUDED ^^^^^^^^^^^^^^^^^^^^^ ******************************************************* * * * If one cannot remember Type I from Type III ( who * * should have to?) or if the software doesn't say ... * * * * How can one tell from a printout whether partial * * F-tests are "variables entered last" tests or * * "variables entered in THAT PARTICULAR ORDER" tests? * * * * ... If the Sums of Squares associated with the * * individual terms add up to the "model" or * * * * "Regression" Sums of Squares, the partial-F * * tests refer to variables entered in THAT ORDER * * * * * ... Sums of Squares associated with each "VARIABLE * * ADDED LAST" can add up to MORE, or to LESS, * * * * than the "Regression" Sums of Squares. * * * * can you think of when they might add up to * * LESS THAN, MORE THAN, or EXACTLY the Model SS? * * * ******************************************************* Multiple Partial F Test ^SINGULAR HT18 = 11.3 - 0.60WT9 + 1.21HT9 - 0.07LG9 - 0.04ST9 -11.24GENDER + 0.76WT2 + 0.19HT2 Summary of Fit Mean of Response 172.72 R-Square 0.91 Root MSE 2.89 Adj R-Sq 0.90 Analysis of Variance Source DF Sum of Squares Mean Square F Stat Prob > F Model 7 4389.93 627.13 74.94 0.0001 Error 50 418.44 8.37 C Total 57 4808.38 Type I Tests Source DF Sum of Squares Mean Square F Stat Prob > F WT9 1 179.21 179.21 21.41 0.0001 HT9 1 2304.17 2304.17 275.33 0.0001 LG9 1 20.62 20.62 2.46 0.1228 ST9 1 248.66 248.66 29.71 0.0001 GENDER 1 1554.48 1554.48 185.75 0.0001 WT2 1 74.34 74.34 8.88 0.0044 HT2 1 8.44 8.44 1.01 0.3200 --- ------- 2 82.78 Partial F-test of additional contribution of age 2 data, once one already has the age 9 data: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ F(2,50) table 0.95 0.99 0.999 82.78 / 2 41.39 ---- ---- ----- F = --------- = ----- = 4.94 3.18 5.06 7.96 8.37 8.37 Type I Tests Source DF Sum of Squares Mean Square F Stat Prob > F WT2 1 935.54 935.54 111.79 0.0001 HT2 1 740.02 740.02 88.43 0.0001 GENDER 1 1952.56 1952.56 233.31 0.0001 WT9 1 0.08 0.08 0.01 0.9238 HT9 1 749.24 749.24 89.53 0.0001 LG9 1 1.50 1.50 0.18 0.6736 ST9 1 10.99 10.99 1.31 0.2572 --- ------ 4 761.81 Partial F-test of additional contribution of age 9 data, once one already has the age 2 data: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ F(4,50) table 0.95 0.99 0.999 761.81 / 4 190.45 ---- ---- ----- F = --------- = ----- = 22.75 2.56 3.72 5.46 8.37 8.37 Double Check (running the bigger & smaller models separately): -------------------------------------------------------------- SS(regression) Source DF Sum of Squares with all 7 terms Model 7 4389.93 with WT2 HT2 GENDER Model 3 3628.12 (earlier) --- ------- Difference 4 761.81 * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * - Further Steps in computing - Getting existing ASCII data into SAS via SAS Editor