Simple Example of LIKELIHOOD Calculation

The color distribution  "m&m's"

"m&m's" Color Percentages

             Brown Yellow  Red Green Orange Blue
Plain           30     20   20    10     10   10
Peanut          20     20   10    10     10   30
Peanut Butter   20     20   10    20      0   30
Almond          20     20   10    20      0   30

Say we could only observe blue vs. not blue (and couldn't see the shape).

Say we KNOW our sample was from either Plain of Peanut.

What if our 'data' consisted of a sample of 5 from one of these two,
and that we observed 1 blue out of 5?

Which 'origin' [Plain or Peanut] do the data favour?

In likelihood terms, what is the probability of observing 1/5
if (a) from Plain (b) from Peanut?

Under which 'hypothesis' (type, parameter value) are the data more likely?

Probability model for data is Binomial(5,.1) vs Binomial(5,.3)

                               Plain         Peanut                      
                                                        Ratio        Difference
                                                     (Peanut:Plain)
                                                    
Lik = prob(observed data)  .1^1 .9^4      .3^1 .7^4 *     
                                                  
    =                         .06561 *       .07203 *    1.1
   
log.Lik =                     -2.724         -2.631                    0.093


IF had 20 times this much data i.e. 20/100 **

log.Lik =                     -54.48         -52.62                    1.860
 

(* ignore the 5C1 multiplier in both )

** from the relationship log.lik = n_blue * log(p.blue) + n_not * log(1-p.blue),
we can see that if have 20 times as much data, but same proportion blue, then 
log.lik is magnified 20 times. Mind you, an observed value of 20/100 is very 
implausible under BOTH scenarios. 

Difference in log likelihood (sometimes with penalty for having larger models)
is often used in comparison of fits

Note that your first instinct was to use a closeness measure such as (O-E). Such
metrics, based on such differences in the Y scale, include Least Squares -- minimize 
(sum of) squared differences -- and minimum chi-square -- Berkson advocated 
a Minimum chi-square criterion for fitting parameters of logistic regression before
ML fitting replaced it.  In the Likelighood approach, we measure distance using
probability.

Note also that (unlike least squares and min chi-sq) Likelihood methods require a 
full statistical MODEL, not just for the systematic part of the regression model
(ie the B0 + B1 X1 + ...) but also for the random variation part (e.g Gaussian, Binomial
or Poisson variation about the expected Y at each covariate pattern).


Likelihood function = Probability of the observed data, as a function of the parameter values.


jh march 09 2007