EPIDEMIOLOGY & BIOSTATISTICS Course 622 (Applications of Statistics) REVIEW QUESTIONS 1 The data concerning the admission of applicants to the graduate faculties at Berkeley revealed that, overall, female applicants had a lower admission rate than males. However, when the admission rates of males and females were compared faculty by faculty, no such discrimination was evident i.e. the two sexes had comparable rates. Can both of these observations hold true, and if so, how? What do we call this phenomenon in epidemiology? 2 How much correlation (--, -, O, +, ++) would there be between the two variables : a (Xl,X2) = (mean temperature June; X2 = mean temperature December) for each of the 100 years 1896-1985 in Montreal. b (Xl,X2) = (mean temp June 15; X2 = mean temp June 16), for each of the 100 years 1896-1985 in Montreal. c (Xl,X2) = (mean temp June 15 in Montreal; mean temp June 15 in Vancouver), for each of the 100 years 1896-1985. d (Xl,X2) = (blood pressure of husband; blood pressure of wife), for each of 100 couples. e (Xl,X2) = (blood pressure of father; blood pressure of son), for each of 100 father-son pairs f (Xl,X2) = (hours of sunshine; centimetres of rain), for 100 summers 1896-1985 in Montreal.? 3 Assume vodka contains 40% alcohol. Write a formula for the % alcohol concentration of a mix of X litres of vodka and Y litres of orange juice. 4 Suppose X=height, measured in inches, and Y=weight, measured in kilograms, in 4 year olds and that the correlation between them was r and the regression coefficient of Y on X was b. If X were converted to centimetres and Y left unchanged, what would happen to r and b? If Y were converted to pounds and X left unchanged, what would happen to r and b? If both were changed , what would happen to r and b? Is the correlation likely to be stronger, weaker, or the same if the sample were enlarged to include the 3-7 year age range? 5 In a dental survey, some 100 Quebec adults were examined and a count made for each person of the teeth which were either decayed, or missing or filled (this single number for each person is often referred to as the person's DMF count). What would the distribution of the 100 #'s look like? What do you think the mean and standard distribution would be?. What is the very largest the standard deviation could be? Would the standard deviation be necessarily larger/smaller if the sample size were 1000 instead of 100? 6 Suppose the average of some variable Y is 2 units higher in males than females, and that it increases by 0.3 units for every 10 years of age in each of the two sexes. Write this statement in the form of a single regression equation. If the average increases by 0. 25 units per 10 yrs for males and by 0.35 units /10 for females, how should the equation be modified?. 7 In a study (of some 100 persons aged 45-65) to quantify the degree to which hearing loss was affected by employees' exposure to the noise from heavy machinery, the number of years of exposure to this noise and the extent of hearing loss were determined for each person. A multiple regression is planned to assess the effect and to take the persons age into account (hearing loss generally becomes worse with age, even if there is no unusual occupational exposure). What is the correlation between age and cumulated exposure likely to be? If it is very high, what will it do to the estimate of the regression slope of loss on exposure?. If it is low, what will it do? If you think it will do very little, would you bother to include age in the regression? If you had a choice of which 100 to select from a larger available group, would you choose on a purely random basis, or on some other basis? Why? Use this context of hearing loss, exposure and age to give one example each of confounding, interaction. and homoscedasticity. If females, because of their longer hair and greater tendency to wear ear-protectors, were analyzed separately from males, how would the regression coefficients for hearing loss on years of exposure compare in the two sexes? 8 Suppose a study (survey sample) finds a difference of 0.1 visits per resident between the average number of visits to doctors by a sample of residents in one community and of those in another; the standard errors of the two sample means are 0.3 and 0.4 What can one say about the differences in the number of visits per resident made by all residents of the two communities? In which community was the bigger sample of residents taken? 9 In a case-control study to quantify the degree to which a non- smoker's chances of developing lung cancer is affected by passive exposure to tobacco smoke, the number of years of exposure to this smoke was determined (easier said than done!) for each of 100 lung cancer cases and 100 controls (all nonsmokers). Suggest some analytic strategies, depending on whether there was matching for age, sex, or both. If one calculated an odds ratio from the 2x2 table formed by crosstabulating case-control status with high or low exposure, how would one obtain a confidence interval for this ratio? If one used exposure as a continuous variable in a logistic regression, and obtained a regression coefficient of 0.01 with a standard error of 0.003, how does one report this effect (and its uncertainty) to the lay public? If one used several controls for every case, how would this affect the estimate of the effect, and the standard error of this estimate? 10 A study is going to examine the difference between the levels of some pollutant in the air of a tall building with sealed windows and one without such windows. There is a limited budget for the analysis of the air samples but when and where they are taken are not constraints. What considerations would affect the timing and location of the samples?