Solved – When does the maximum likelihood correspond to a reference prior

I have been reading James V. Stone's very nice books "Bayes' Rule" and "Information Theory". I want to know which sections of the books I did not understand and thus need to re-read further. The following notes which I wrote down seem self-contradictory:

The MLE always corresponds to the uniform prior (the MAP of the uniform prior is the MLE).
Sometimes a uniform prior is not possible (when the data lacks an upper or lower bound).
Non-Bayesian analysis, which uses the MLE instead of the MAP, essentially sidesteps or ignores the issue of modeling prior information and thus always assumes that there is none.
Non-informative (also called reference) priors correspond to the maximizing the Kullback-Leibler divergence between posterior and prior, or equivalently the mutual information between the parameter $\theta$ and the random variable $X$.
Sometimes the reference prior is not uniform, it can also be a Jeffreys prior instead.
Bayesian inference always uses the MAP and non-Bayesian inference always uses the MLE.

Question: Which of the above is wrong?

Even if non-Bayesian analysis does not always correspond to "always use the MLE", does MLE estimation always correspond to a special case of Bayesian inference?

If so, under which circumstances is it a special case (uniform or reference priors)?

Based on the answers to questions [1][2][3][4] on CrossValidated, it seems like 1. above is correct.

The consensus of a previous question I asked seems to be that non-Bayesian analysis cannot be reduced to a special case of Bayesian analysis. Therefore my guess is that 6. above is incorrect.

Best Answer

Correct, as long as the support of the uniform prior contains the MLE. The reason for this is that the posterior and the likelihood are proportional on the support of the uniform prior. Even if the MAP and MLE coincide numerically, their interpretation is completely different.
False. The support of the prior is certainly location and scale dependent (e.g. if the data are reported in nanometers or in parsecs), but an appropriate choice is often possible. You may need to use a huge compact set as the support, but it is still possible.
It does not use prior information in the sense of a prior distribution (since they are completely different inferential approaches) but there is always information injected by the user. The choice of the model is a form of prior information. If you put 10 people to fit a dataset, some of them would probably come up with different answers.
Yes. Have a look at the following references

The formal definition of reference priors

Jeffreys Priors and Reference Priors

The reference prior and the Jeffreys prior are the same in uniparametric models (unidimensional parameter), but this is not the case in general. They are uniform for location parameters, but this is not the case of scale and shape parameters. They are different even for the scale parameter of the normal distribution (see my previous references).
False. Truly Bayesians use the posterior distribution in order to obtain Bayes estimators. The MAP is one of them, but there are many others. See Wikipedia's article on the Bayes estimator.

Non-Bayesians do not always use the MLE. An example of this is the James-Stein estimator, which is based on a different criterion than maximizing a likelihood function.

Best Answer

Related Solutions

Solved – How to combine multiple prior components and a likelihood

Related Question