513-607: Principles of Inferential Statistics (Fall 2001)

McGill University, Department of Epidemiology and and Biostatistics
513-607: Principles of Inferential Statistics (Fall 2001)

Frequently Asked Questions (FAQs)

Question / Suggestion Response
  Where can i find the P values for a chi-square x2 value? Since there are several Chi-square distributions, one for each degree of freedom, only certain landmarks or cutpoints are given (just like the t-tables)

In most of our applications, the relevant distribution is the chi-sq with 1 df, the very first row of Table F, page T-20 of the text

So for example, if you had a chi-sq value of 4.21, the table would tell you that some 5% of the distribution is above (to right of) 3.84, some 2.5% above 5.02, etc..

so there must be approx 4% of the distribution to the right of 4.21

Of course, as I mentioned in class, the % of the chi-sq (1df) distribution that is to the right of 4.21 is the same as the % of the Z distribution that is beyond (outside) Z= - sqrt[4.21] to Z = +sqrt[4.21] i.e. outside |Z| = sqrt[4.21]

remember that both positive and negative Z values become positive chi-sq values, since for 1 df, chi-sq is just z-square!
Nov 8 I would like to know in proportions if the Z Test of significance cannot be used because the normal approximation cannot be used i.e. when np or n(1-p) is < 5. Then how do we test the significance of a statistic.

For e.g. chapter 8 Q 38 M&M. No of birth defects was 3 out of 228 children So np = 2. In this case normal approximation cannot be used. So how do we test the significance.
The 'guidelines' are always in relation to the EXPECTED numbers in the cells NOT the OBSERVED counts

See the last point I make in the left column of p 6 of my notes of Ch 9

Notice that I am using the EXPECTED number > 5 , not the very conservative 10 that M and M use.

Also, the expected values in a 2 x 2 table are calculated as roq total x col total / overall total, ie 19 x 228 / 642, or 228 times the overall rate of (19/228)
Oct 23 1.

I am just looking over the articles placed on the site and trying to make sure I can derive exactly how all their results were found (and/or calculated)..I am confused however on how their summary value (as written under the results section in the abstract- "Mean BMD values were 0.02-0.04 lower in OC users") was calculated? i.e. which 'n' was used?


I understand how "never" and "ever" were two independent (well assumed) samples and how they obtained each mean difference using the CI for the t-statistic....but if the "n"'s are different for each characteristic (mean lumbar BMD, mean femoral neck BMD, mean trochanter BMD and mean Ward's area BMD) which 'n''s did they used to obtain their average range of 0.02 to 0.04?


Lastly, would you like us to use the approximation of the degrees of freedom (Moore and McCabe) or would you like us to use the most conservative approach (according to what was said in class today) for the midterm?

the range of diffs is because of the 4 anatomic sites; for the exam focus on the spine, which has the largest n's.

(the methods explain why the n 's were smaller for some sites)


I think the range is from fig 1 where adjustment was for age bmi and height

Table 2 adjusts for more factors

but in the exam I focus on table 1 (unadjusted) and just on lumbar spine


how much does it matter in this example?
Oct 2 Construction of, and semantics behind, CI

Just to check that I'm right...
Click on link
Sep 17 I would like to let you know my score of Assignment 1 (. ...) It is difficult to judge if the answers are fully or partly correct by myself. please wait.. i am devising a more organized way to gather these than email.. i will let everyone know when ..
show me after class the ones where you had trouble deciding..
Sep 16 I ran into some trouble with question 1.120. I haven't yet installed the SAS software on my computer, and I haven't been able to figure out how to get excel to perform the desired function. Is excel able to perform this function? If so, could you possibly give me a couple of tips about this? Thanks for emailing.. I hadn't explained how to do this.
I just now put an excel spreadsheet on the front page that does this ...Hope the 64 others look here before they email me!
Sep 12 A classmate has just alerted me to the fact that "homegrown" exercises are to be submitted, along with the assigned exercises! I guess I must have misunderstood, but somehow I thought I had heard that these "homegrown" exercises were examples given for us to work on as extra practice. Sorry I wasn't clear to everyone (> 90% read it the way i intended); i see where you might have misunderstood from the 1st page of the "homegrown stuff.. where i mention "suggested "
there i do mean suggested but on front page of course web page I do say "homework due by this date" and I give the list of what is due
the name "homegrown" was to identify the SOURCE (book or JH) of the exercise , not where it remains once done.
Sep 10 1 The first assignment for 513-607 is due tomorrow. Overall I found it pretty straight forward. However, the HG exercise on Maternal Lead Levels After Alterations to Water Supply part a) is unclear to me. Would you consider reviewing it at the beginning of tomorrow's lecture?

2 Suggestion: Would it be possible to have the assignments available on the web site well ahead of the due date, for example one week ahead? This would be most helpful in planning the use of my time more efficiently.
1 did you read colton on true limits and midpoints etc?
(maybe I should have asked about true limits not true intervals?)

2 indeed.. I will try to get as far ahead as i can, so you can get ahead of the schedule..

tonight i will put on the next one (and if I can the next one after that) but it may take until the weekend before i can get further ahead of the schedule..
Sep 10 I need to know in comparing two histograms the y-axis intervals should be
similar or not.
E.g. No of mothers in 1977 at intervals of 10, 20, 30 etc.
No of mothers in 1980 at intervals of 25, 50, 75, etc.
not necessarily.. if back to back

important that they share the same SCALE
but scale can be divided up differently in
the two


| | | | etc
| | | | | | | etc

as long as the contrast of the 2 distrns comes through!
Sep 10 Hi Prof. Hanley
I was at the bookstore and unfortubately they have sold all copies of the required text book. They said they will order 30 more copies. Meanwhile, I have completed the homegrown exercises. Can you get back me and let me know what to do? Thank you for your time
The bookstore ordered 50 copies

I have left a voice mail with the bookstore to find out
when the next batch will be in

meanwhile with this email I am asking Marlene Abrams
to send a message to all of our upper year students to ask if any of them would be willing to sell (or loan) their copy from last year..

Marlene: the book in question is Moore and McCabe
Intro to Practice of Statistics 3rd edition [the edition with the pictures of campbell's soups on the front cover!]

for now try to photocopy the exercises from someone
my copy is at home but I will bring a photocopy
Aug 29 Do you know the date of the final exam in stats 607A? Tuesday Dec 11 9:00-12:00
Aug 28 I am an incoming Masters thesis student who just received your email regarding familiarity with statistical computer packages.

Although I am somewhat familiar with SAS (having used this program in stats II), I have been using Statview for all analyses performed as a research assistant. How useful could Statview be in this course?
Our dept (and most of the affiliated research units) find SAS to be very powerful for handling complex data. We have 20 years of experience with it and with so many persons here with SAS expertise, a new person who learns SAS will not be alone. SAS is also the main package in the multiple regression course 621-data analysis I) in the winter term.

I expect Statview would be fine, but the issue is whether you want to be somewhat on your own next term -- and for your thesis!