Oct 2 2001

Correspondence..between student and J Hanley

I just want to check that I'm on the right track with the real method of constructing a confidence interval. So, what you have to ask yourself after you get a calculated value from your sample is " What are the chances that the true value of the population parameter is much lower than my estimate?

CAN NOT talk about the chances of the TRUE VALUE being somewhere
since -- in frequentist approach -- true value is a constant (albeit unknown)

YOU HAVE TO TALK ABOUT THE PROBABILITY ASSOCIATED WITH THE STATISTIC (x) GIVEN SOME "TRY_OUT" VALUE OF THE PARAMETER

As I explained, in frequentist inference, wer are limited to talking about the probabilistic behaviour of the STATISTIC.

So ask the question the other way round.. (and hypothetically)

What are the chances that my estimate is this high or higher relative to a (hypothetical, try-out) true value of the population parameter?

It seems like semantics but this wording keeps the Bayesian and frequentist approaches separate... the way you were headed, you were "crossing over".

At what population proportion p could I reasonably expect to get this sample proportion of x?"

or one more extreme...!

I could reasonably expect it if the area under the curve greater than or equal to x in the population distribution equalled 0.025.

if "trying out" or "entertaining" parameter values lower than the observed statistic x

So you check: "If the true value of the p of the population was 0.1, the probability (area under the dist about the p) of getting value x would be _____." In this distribution, your value might be way off the charts, so you take another distribution, with a higher p, until you come to a distribution where your value is 2 SD's from the population proportion.

RIGHT..


but doesn't have to be 2SDs if using Binomial itself.. it could be whatever
parameter value p such that prob( >= x | this value of p)
yields an upper tail area of 0.025


If the distrn. were close to Gaussian, then yes 2SD would correspond to
an upper tail area of 0.025 (approx .. its really 1.96SD for 0.025)

Now you know that your sample could have come

yes with "could have" defined as above in terms of a tail area of 0.025

from this distribution, therefore, your estimate could just be 2 SD's from that population p.  

You do the same for the upper value. Then you know

never "KNOW"

that the range between the upper and lower limit (the numbers where it is plausible to have gotten that estimate of p) probably contains the true value of the population.

not exactly... Say:

' the data (x) COULD HAVE BEEN GENERATED BY values in this range of parameter values'


(in sense that datapoint x is not "that far" from one of these
hypothetical values of the parameter)


Am I right or should I come to you for some help on this issue?

A few "miss-speaks" but more or less correct. email again if still not clear

You see how LEGAL we have to be in our language!


JH