Course 613.. Assignments

McGill University, Department of Epidemiology, Biostatistics and Occupational Health

EPIB 613: Introduction to Statistical Software (Fall 2006)

Assignment 5 (last one!), due November 20

Working in teams of two or three...


Refer to the article Exposure to Scientific Theories Affects Women’s Math Performance by Ilan Dar-Nimrod and Steven J. Heine. You can find the article in the .pdf file [link]. The file contains -- courtesy of the first author -- the pre- and post-manipulation mathematics scores on which the article is based, along with some supplementary material (and analyses and notes from JH). If you have trouble extracting the data from the pdf file, they, and some SAS code, can also be found in this text file [link].

The analyses done by JH at the end of this .pdf file used the data from all 4 groups. For this assignment, restrict attention to two groups i.e. the 'ND vs. S' comparison, and redo the requested analyses 'from scratch'. Note that for some portions below, rather than work with the raw math1 scores, it may be easier to work with 'centered' math1 scores (JH called the variable math1c) whose average across the combined ND and S groups is 0. He got these by first obtaining the overall math1 mean in the two groups combined, and subtracting this mean from each individual's math1 score.

a. Check 'how well the randomization worked' by computing the mean pre-manipulation math score in each of these two groups, and the difference of the two means. On this basis, which of the two groups has an 'math advantage' even before the manipulation?

b. Compute the mean and SD of the post-manipulation math scores in each of these two groups, and the (crude) difference of the two means. By hand, compute the t-statistic (common variance version). Verify your calculation by running the t-test in your favourite statistical package {it is called TTEST in SAS and ttest (or the immediate form ttesti) in Stata}. Comment on the p-value [ or the CI for the difference in means ].

c. As suggested by some, 'level the playing field' by working with the post-minus-pre difference in math scores rather than the post-manipulation scores you used in (b), i.e. repeat step (b) but using the change scores. Comment on the p-value.

d. Even within the ND group (or within the S group), there isn't a perfect 100% correlation between the pre- and post scores. For each group, plot the post- vs. pre- scores. Obtain the (within-group) correlations of pre- and post- scores, and the (again, within-group) regression equations of the post-scores on the pre-scores (for the regressions: if in SAS, you can for example use PROC REG; if in Stata you can use 'regress' ).

e. For two groups of ND subjects, based on the regression equation fitted to the scores in the ND group, how far apart would you predict their averages to be on the post-manipulation scores if on average they were (i) 1 point apart on the pre-manipulation math exam? (ii) 0.9 points apart pre- ?

Make the same type of calculation for two groups of S individuals 1 point apart pre-manipulation.

f. Given that the mean pre-scores of the S and ND groups were in fact just about 0.9 points apart, how far apart would you expect the mean post-scores to be IF the manipulation had NO effect? Do the calculation twice, the first time using an 'exchange rate' {slope} for the value of 1 extra point pre-manipulation based on what you saw in the ND group, and the second time using the slope from the S group.

g. Given the crude difference you did see in the post-scores in (b), and the advantage calculated in (f), what differences in the mean post-scores would you arrive at if you have leveled the playing field using these two different correction factors? {you can think of using the pre-scores as giving each person a different 'handicap' in the second competition -- just as if the contest between ND and S groups involved golf rather than math!

h. Repeat step (c) but using an intermediate (common) exchange rate to obtain an adjusted post-score for each subject, i.e.,

= post-score - 0.58 x (pre-score - average pre-score in combined groups)

where 0.58 is the (assumed common) regression coefficient (slope) obtained in (i) below.

{this approach uses parallel regression lines for the two groups, Next term, you will learn how to test whether this assumption of parallel lines, i.e. a common slope -- an assumption in the 'analysis of covariance' that the authors referred to 6 lines from bottom of second column of article -- is justified by the data}.

Comment on the between-group difference in the means of the adjusted values, and its associated p-value.

i. For each subject, use an indicator variable S=1 if in group S and 0 if not i.e. if in group ND. Then run the following (multiple) regression equation:

(average) post-score = B0 + B1*pre-score + B2*S.

In SAS: PROC REG; MODEL Post-score = pre-score S;
In Stata: regress Post-score pre-score S

How close is the B2 coefficient for S to the difference in (adjusted) means shown in Figure 1 (Left) in the article?

j. Draw the pair of fitted parallel lines (obtained by setting S=0 and S=1 respectively in the fitted equation in (i)) in a diagram similar to that in the 'confounding: reducing it by regression' notes found at the end of the .pdf file.

Interpret the 'crude' and 'adjusted' differences in the light of this diagram, or in light of the 'anatomy of the adjustment' section of the same notes.


Put all of the program steps and relevant output and manual calculations into a single .doc (Word) file. Use a mono-spaced font such as Courier -- that way the output should align so it is more readable. Interleave DATA and PROC statements with output and conclusions, and use helpful titles over top of each output. Show relevant excerpts rather than entire listings of datafiles. Annotate liberally. Submit the Word file electronically (i.e., by email) to JH by 9 am on Monday Nov 20.

(updated Nov 8, 2006)