Maximum Likelihood – Why Use Maximum Likelihood Instead of Expected Likelihood?

expected valuemathematical-statisticsmaximum likelihoodoptimizationprobability

Why is it so common to obtain maximum likelihood estimates of parameters, but you virtually never hear about expected likelihood parameter estimates (i.e., based on the expected value rather than the mode of a likelihood function)? Is this primarily for historical reasons, or for more substantive technical or theoretical reasons?

Would there be significant advantages and/or disadvantages to using expected likelihood estimates rather than maximum likelihood estimates?

Are there some areas in which expected likelihood estimates are routinely used?

Best Answer

The method proposed (after normalizing the likelihood to be a density) is equivalent to estimating the parameters using a flat prior for all the parameters in the model and using the mean of the posterior distribution as your estimator. There are cases where using a flat prior can get you into trouble because you don't end up with a proper posterior distribution so I don't know how you would rectify that situation here.

Staying in a frequentist context, though, the method doesn't make much sense since the likelihood doesn't constitute a probability density in most contexts and there is nothing random left so taking an expectation doesn't make much sense. Now we can just formalize this as an operation we apply to the likelihood after the fact to obtain an estimate but I'm not sure what the frequentist properties of this estimator would look like (in the cases where the estimate actually exists).

Advantages:

  • This can provide an estimate in some cases where the MLE doesn't actually exist.
  • If you're not stubborn it can move you into a Bayesian setting (and that would probably be the natural way to do inference with this type of estimate). Ok so depending on your views this may not be an advantage - but it is to me.

Disadvantages:

  • This isn't guaranteed to exist either.
  • If we don't have a convex parameter space the estimate may not be a valid value for the parameter.
  • The process isn't invariant to reparameterization. Since the process is equivalent to putting a flat prior on your parameters it makes a difference what those parameters are (are we talking about using $\sigma$ as the parameter or are we using $\sigma^2$)