Solved – Monte Carlo maximum likelihood vs Bayesian inference

bayesianfrequentistmarkov-chain-montecarlomaximum likelihoodmonte carlo

I recently heard about MCMLE (Monte Carlo maximum likelihood estimation) for finding
$$
\hat\theta = \underset{\theta}{\text{argmax}} \frac{\exp\left(\theta^TT(y)\right)}{c(\theta)}
$$

when the normalization constant $c(\theta)$ is too hard to compute. The main reference I see is "Constrained Monte Carlo Maximum Likelihood for Dependent Data" by Geyer and Thompson (1992). In practice I've only encountered this with estimating ERGMs (exponential random graph models) but I gather it can be used more widely.

Briefly, the idea is that we can pick some $\theta_0$ and sample $y_1,\dots,y_n$ from $p(y|\theta_0)$ via MCMC (since $c(\theta_0)$ cancels out from the accept/reject ratios) and then we can use the fact that
$$
\frac 1n \sum_i e^{(\theta-\theta_0)^TT(y_i)} \to_p E_{\theta_0}(e^{(\theta-\theta_0)^TT(Y)}) = \frac{c(\theta)}{c(\theta_0)}
$$

and this allows us to find the MLE. Generally this process requires running a full MCMC chain at every update to the current MLE solution.

My question: why would we do MLE at all here? This seems like a really weird in-between where we're going to go to all the effort of running MCMC chains but we're still insisting on frequentist inference. Why not just do a fully Bayesian analysis in this case? I know MLE is convenient and all but this just feels like an odd attempt to keep using a frequentist method when it's more work than just using the existing Bayesian tools. Or are there advantages I'm not appreciating?

Best Answer

The reason for using Monte Carlo methods in the first place is that conventional methods can't be applied when dealing with intractable distributions.

If your distribution is such that you consider using MCMLE, then a Bayesian estimation does not have to be easier.
One of most common use cases for Monte Carlo is in Bayesian statistics, for approximating intractable posterior distributions. Estimating parameters in a Bayesian fashion you may well end up with MCMC for for approximating the posterior at every iteration.

Related Question