Course 678: Analysis of Multivariable Data. June 1999 Homework to be handed in by Monday June 7 ***************************************************************** 1. KKMN Ch 7, Question 7 (concerning problem 6 from ch. 5). 2. KKMN Ch 8, Question 12. 3 See the documentation on the "fruitfly data" NB it is in the web page for a different course www.epi.mcgill.ca/~web2/courses/c622/datasets.html Download the dataset, say to your "a" drive Paste the following SAS program into the Program Editor DATA sasuser.fruitfly; INFILE 'a:\fruitfly.dat' ; INPUT ID PARTNERS TYPE LNGEVITY THORAX SLEEP ; IF PARTNERS = 1; RUN; and click on the "run" icon to produce and save a sas file called "fruitfly". Note that the statement "IF PARTNERS = 1;" only creates records for those with 1 partner. a. What is the differerence in average longevity between the flies with the two types (0 and 1) of partner? b. Is it statistically significant (at the conventional 0.05 significance level) by a t-test? In addition to doing the t-test directly, also perform the t-test by fitting a regression model LNGEVITY = TYPE. Note that the coefficient for TYPE is the differerence in average longevity, and that the SE of this difference is the same as the SE you calculated for the t-test. Why do you think the results are not significant at the conventional significance level? Would you recommend a larger study? Grant money for this type of research is scarce nowadays, so can you instead think of a more refined analysis of these existing data? c. Are there other determinants of longevity? d. If so, are these determinants more or less equally distributed in the TYPE=0 and TYPE=1 flies or is one group at a longevity advantage from the start? [one would expect them to be reasonably well "balanced" since this was a randomized trial and the sample sizes are 25 and 25] e. Do you think you add any of the determinants to the model? Why/ why not? f. (Even if you don't think you should add any of them) add THORAX anyway and see what happens! g Interpret -- to you in-laws --the coefficient for TYPE in the model with TYPE and THORAX ? Why has it changed from what it was in subquestion b, and why in the particular direction it has? Again, you're talking to someone who has not had a regression course. h Try to explain to them why the statistical significance of the coefficient for TYPE in this model changed so much relative to the statistical significance of the coefficient in the simple model in subquestion b ?. Try not to use technical words like significant or p-value. (not that you have any spare time to read it now... but some of the answers are under the headings "making comparisons fairer" and "making comparison sharper" in the article "Appropriate Uses of Multivariate Analysis" by JH in Annual Review of Public Health, 1983 (on the web page for course 697...) =============== FOR LATER =============================== # See the documentation on the "new room data" NB it is in the web page for a different course www.epi.mcgill.ca/~web2/courses/c622/datasets.html Download the dataset, say to your "a" drive Download the SAS program into the program Editor - change "DATA a;" to "DATA sasuser.newroom;" - In the INFILE statement, give the full path to the ascii datfile; then click on the "run" icon to produce and save a sas file called "newroom". Answer the questions posed at the end of the story, paying attention to making the comparison as fair and as sharp as possible. #. Use the link "new data on Old Faithful [M&M]" on the class web page to obtain the story "Is Old Faithful faithful?". Download the data to your computer. Say you download it to a floppy drive (a:\) and name the file "OFDATA.TXT". Use Notepad or a similar program to remove the first two lines. Paste the following SAS program into the SAS Program Editor and click on the "run" icon to produce and save a sas file called "oldfaith". DATA sasuser.oldfaith; INFILE "a:\OFDATA.TXT" ; INPUT Date Time Duration Interval Preplay Height Predict ; RUN; Of the 344 records read from the infile, some will have missing values (where the log window notes invalid data, values will be set to missing). Answer the 4 questions at the end of the story. #. No calculations required at this stage, just thinking! Read the story on the Brink's data (on the class web page). - How might you go about estimating the damages? - How might you try to use the data from area 1A?