Bayesian – Appropriateness of MCMC Methods When Maximum a-Posteriori Estimation Is Available

bayesianmarkov-chain-montecarloposterior

I have been noticing that in many practical applications, MCMC-based methods are used to estimate a parameter even though the posterior is analytical (for example because the priors were conjugate). To me, it makes more sense to use of MAP-estimators rather than MCMC-based estimators. Could anyone point out why MCMC is still an appropriate method in the presence of an analytical posterior?

Best Answer

No need to use MCMC in this case: Markov Chain Monte-Carlo (MCMC) is a method used to generate values from a distribution. It produces a Markov chain of auto-correlated values with stationary distribution equal to the target distribution. This method will still work to get you what you want, even in cases where the target distribution has an analytic form. However, there are simpler and less computationally intensive methods that work in cases like this, where you are dealing with a posterior that has a nice analytic form.

In the case where the posterior distribution has an available analytic form, it is possible to obtain parameter estimates (e.g., MAP) by optimisation from that distribution using standard calculus techniques. If the target distribution is sufficiently simple you might get a closed form solution for the parameter estimator, but even if it is not, you can usually use simple iterative techniques (e.g., Newton-Raphson, gradient-descent, etc.) to find the optimising parameter estimate for any given input data. If you have an analytic form for the quantile function of the target distribution, and you need to generate values from the distribution, you can do this via inverse transform sampling, which is less computationally intensive than MCMC, and allows you to generate IID values rather than values with complex auto-correlation patterns.

In view of this, if you were programming from scratch, then there does not seem to be any reason you would use MCMC in the case where the target distribution has an available analytic form. The only reason you might do so is if you have a generic algorithm for MCMC already written, that can be implemented with minimal effort, and you decide that the efficiency of using the analytic form is outweighed by the effort to do the required math. In certain practical contexts you will be dealing with problems that are generally intractable, where MCMC algorithms are already set up and can be implemented with minimal effort (e.g., if you do data analysis in RStan). In these cases it may be easiest to run your existing MCMC methods rather than deriving analytic solutions to problems, though the latter can of course be used as a check on your working.