Simple Example of LIKELIHOOD Calculation The color distribution "m&m's" "m&m's" Color Percentages Brown Yellow Red Green Orange Blue Plain 30 20 20 10 10 10 Peanut 20 20 10 10 10 30 Peanut Butter 20 20 10 20 0 30 Almond 20 20 10 20 0 30 Say we could only observe blue vs. not blue (and couldn't see the shape). Say we KNOW our sample was from either Plain of Peanut. What if our 'data' consisted of a sample of 5 from one of these two, and that we observed 1 blue out of 5? Which 'origin' [Plain or Peanut] do the data favour? In likelihood terms, what is the probability of observing 1/5 if (a) from Plain (b) from Peanut? Under which 'hypothesis' (type, parameter value) are the data more likely? Probability model for data is Binomial(5,.1) vs Binomial(5,.3) Plain Peanut Ratio Difference (Peanut:Plain) Lik = prob(observed data) .1^1 .9^4 .3^1 .7^4 * = .06561 * .07203 * 1.1 log.Lik = -2.724 -2.631 0.093 IF had 20 times this much data i.e. 20/100 ** log.Lik = -54.48 -52.62 1.860 (* ignore the 5C1 multiplier in both ) ** from the relationship log.lik = n_blue * log(p.blue) + n_not * log(1-p.blue), we can see that if have 20 times as much data, but same proportion blue, then log.lik is magnified 20 times. Mind you, an observed value of 20/100 is very implausible under BOTH scenarios. Difference in log likelihood (sometimes with penalty for having larger models) is often used in comparison of fits Note that your first instinct was to use a closeness measure such as (O-E). Such metrics, based on such differences in the Y scale, include Least Squares -- minimize (sum of) squared differences -- and minimum chi-square -- Berkson advocated a Minimum chi-square criterion for fitting parameters of logistic regression before ML fitting replaced it. In the Likelighood approach, we measure distance using probability. Note also that (unlike least squares and min chi-sq) Likelihood methods require a full statistical MODEL, not just for the systematic part of the regression model (ie the B0 + B1 X1 + ...) but also for the random variation part (e.g Gaussian, Binomial or Poisson variation about the expected Y at each covariate pattern). Likelihood function = Probability of the observed data, as a function of the parameter values. jh march 09 2007