I will explain this issue by copying here a message received via email
and my answer.

> I want to clarify my understanding of setting up a prior for an
Inference for a single normal mean. When we specify
the prior, we use the notation N(mean, sd or variance
or SE), but I'm never sure if I should be stating the standard
deviation or the variance or the standard error.

In frequentist analyses, SDs are used for population description purposes,
i.e., to describe how one person varies from another on the measure of
interest and within the population of interest. SEs are used to estimate
how accurately a model parameter has been estimated. So, you might have SD
= 10 to describe how blood pressure varies from one person to the next in a
population, and then SE = 1, for example, once a sample has been collected,
representing how accurately we now know the mean value.

Note that the SD is a population characteristic, and stays constant
regardless of sample size, while the SE depends on the sample size. In
general, the larger the sample size the smaller the SE.

When I used to teach 607, I handed out an article on this topic each year,
and if you want to read more, you can find it on page 106 (or 109 of the pdf
file) of my 607 course notes, which are here:

607 course notes
All of the above is for frequentist analyses. Bayesian analysis is quite
different, since the term SE is never used, only SD, which is used for all
variability related terms, and always represents the variability in whatever
distribution is under discussion. So, if one is discussing a distribution
representing person to person variability in a population, then the SD
represents that variability. If one is discussing a prior distribution for
a mean parameter, then the SD of that prior represents how accurately the
researcher knows that mean value, before analyzing the data. If one is
discussing the SD from the posterior distribution, then the SD represents
how accurately one knows the mean value after analyzing the data.

So, to draw an analogy between frequentist and Bayesian analysis, the SD
from the freq viewpoint is the same as the Bayesian SD when discussing
person to person variability in a population. The SE from a freq viewpoint
is similar to the Bayesian SD for the posterior distribution (but will be
numerically similar only if little prior information is used). There is of
course no freq equivalent to the Bayesian prior SD, as freqs ignore prior
information.

> Also, is this normal distribution we specify the distribution of
where we think
the mean value lies, or where we think and individual's value is likely to be?
If it is a distribution for the mean, we would use the SE, but if it is a distribution
for an individual's possible value, we would use either sd or variance, correct?

Yes, I think you have the right idea, but see above explanations, as
Bayesians would not call it the SE. If setting up a prior distribution for
the mean, the SD you need to represent your uncertainty about the mean
value, and if discussing person to person variability, you would use the SD
that represents population variability.

> Also, I'm confused about generally the same issue for the posterior
distribution... is
the posterior distribution a probability distribution for where the mean lies, or the
probability distribution of where an idividual's value may lie... and therefore what
is the second parameter specified in the posterior notation N(71.69, 2.68)
for example?

Your first guess is correct, the posterior distribution is a probability
distribution for where the mean lies, after analyzing your data. So, in the
above example, if 2.68 is the posterior variance, then sqrt(2.68) = 1.64 is
the posterior SD, which represents how accurately the mean is known after
data analysis.

> My thoughts are that the prior is a distribution for the possible
values
for an individual, and therefore the second parameter should be the sd
or variance,

No, this is not correct, as per above discussion. The prior is a
distribution representing your knowledge about the mean parameter value.

> and the posterior distribution
is the probability distribution for the mean, and therefore the second
parameter is the SE,
but I would really like to hear your clarification on this.

Yes, that is correct, but, again, Bayesians would not call it the SE, and it
would not be the same numerical value as the freq SE unless lile prior info
is used.

One last point: Bayesians never need to use SE = SD/sqrt(n), since the SD
from the posterior distribution already incorporates information from the
data, so the sample size is "built-in" already.

8. How do I recode data in R? For example, I have a variable that is categorical, but I want to convert it to a dichotomous variable.

Download the CAR package for R, and use the recode function. For
example, suppose you have a categorical variable called dose with nine
categories that you want to dichotomize. Suppose that it is now coded
as 1, 2, 3, ..., 9, and you want to dichotomize at five, such that 1
through 5 are coded as 0, and the rest as 1. You can run this command:

dose.dichot <- recode(dose, "c(1,2,3,4,5)='0'; else='1'")

Your new variable dose.dichot will now be dichotomized.

I have not taken any formal surveys. Nevertheless, the disadvantages of
p-values have been discussed almost since they were proposed. There are
literally hundreds of articles in the statistical and medical literature
explaining the disadvantages of p-values, and expressing preferences for
confidence intervals and/or Bayesian methods. As discussed in class
(see page 13 of notes from January 6th lecture), major journals
including the leading journal in our field, Epidemiology virtually ban
the use of p-values or even discussions based on "significance" within
their pages.

Check out these web pages for many references concerning p-values:

www.fharrell.com/post/pval-litany/
www.indiana.edu/~stigtsts/
warnercnr.colostate.edu/~anderson/thompson1.html
www.npwrc.usgs.gov/resource/methods/pressugg/intro.htm
www.cnr.colostate.edu/~anderson/null.html
Here is a small sampling of other articles (and check their
reference lists for many, many others):

Malakoff D. Bayes offers a ’new’ way to make sense of numbers. Science.
1999 Nov 19;286(5444):1460-4. (Review from Science about the rise of
Bayesian analysis in Science.)

Dunson DB. Practical advantages of Bayesian analysis of epidemiologic data.
American Journal of Epidemiology 2001 Jun 15;153(12):1222-6.

Goodman SN. Of P-values and Bayes: A modest proposal. Epidemiology.
2001 May;12(3):295-297.

Evans J, Mills P, Dawson.J. The end of the p-value? British Heart
Journal 1988; 60:177-180.

Altman DG. Why we need confidence intervals. World J Surgery
2005;29(5):554-556.

Gardner MJ, Altman DG. Confidence intervals rather than P values: estimation rather than hypothesis
testing. Br Med J. 1986 Mar 15;292(6522):746-750.

Altman DG, Gardner MJ. Calculating confidence intervals for regression and correlation.
Br Med J. 1988 Apr 30;296(6631):1238-1242.

Gardner MJ, Altman DG. Using confidence intervals. Lancet. 1987 Mar 28;1(8535):746.

Taken together, you can see that the major scientific journals such as
Science, as well as the two most respected epidemiology journals,
including American Journal of Epidemiology and Epidemiology and major
medical journals such as BMJ and Lancet, have published editorials
encouraging methods other than frequentist
hypothesis testing. As one last example, I have published an article in
JAMA with Dr. James Brophy, another professor in our Department,
comparing Bayesian credible intervals to frequentist p-values for
clinical decision making. It summarizes many of the arguments made in
our 621 course about why p-values are not useful for making any
important clinical decisions.

Brophy JM, Joseph L. Placing trials in context using Bayesian analysis:
GUSTO revisited by Reverend Bayes. Journal of the American Medical
Association 1995;273(11):871-875.

If you search the web, you will find many other examples, as well as
lecture notes from courses at other universities that contain similar
material.

11. I am confused about confounding. We have seen various methods to look for confounding, for example checking the basic conditions for confounding, looking at the difference in estimates in going from a univariate to a multivariate model, and using the $mle and $se output from the bic programs. How do I put all of this information together?

If you do not have access to the nicely formatted output provided by the
BIC programs we have covered in class, then you need to run models
"manually", one by one. While nothing stops you from looking at models
other than univariate or full multivariate, it gets unweildy rather
quickly if you have more than a few variables to consider. So, one
often may just look at changes in coefficients between univariate and
full multivariate models, and make do with that. What would be missing
is extra information from the "intermediate" models, between univariate
and multivariate, that may help to decipher which variables act as
confounders for other variables. Sometimes one may want this level of
detail, other times maybe not.

Which variables to leave in a "final model" depends on the purpose of
the regression. If you are trying to create a model optimized for
making future predictions, then using the model average as output by the
BIC is optimal under certain conditions (see paper by Raftery for
details).

If the purpose is to investigate the effects of one or more main
variables while adjusting for confounding from other variables, then
conclusions are drawn from running a series of models, such as those
output by the BIC programs, or, in the absence of that, run manually one
by one. For this purpose, there is no need to discuss only a single
final model, because one can draw conclusions from the ensemble of
models run.

Having said that, sometimes for reporting purposes, you might want to
present a "final model". For example, most journals will not give you
unlimited space to publish results from a series of models just to
discuss confounding. If you are working under such restrictions, then
you can start by learning about confounding by running as many models as
you need to run. Then, after drawing conclusions, present a model that
includes all variables that are important to the "story" you want to
tell. So, here you would include all variables of primary interest, and
all variables that you have concluded are confounders of your main
variables. Also, you might include any other variables that have
important effects. All other variables can then be omitted.

12. I have an ordered categorical variable. When modeling, should I treat it as a continuous variable, or as categorical?

It is sometimes better to treat ordered categorical variables as
continuous, sometimes not.

If you insert them into a model as a continuous variable, it implies that
for each unit increase, the effect on the outcome is the same. In your
example, it implies that a switch from category 1 to category 2
has the same effect as switching from category4 to category 5.

If that seems reasonable, then you can leave them as continous. If that
assumption is likely to be false, then you should switch to using a
categorical variable, which will then estimate an effect for each level, so
that the effects can differ depending on where they are in the order.

Two more points:

1. It may sometimes not be clear whether the assumption is likely to
hold or not. In this case, you can plot the outcomes versus each
category and see if there is a roughly linear trend. If yes, then leave
as a continous variable, if not, switch to categorical.

2. If you have a very large number of categories but treating as a
continuous variable is not reasonable, then you may want to "collapse"
similar categories so that the total number of categories is decreased.
Again, you can decide which categories are similar enough to collapse
together by looking at a graph as suggested in point 1 above.