Solved – Maximum likelihood parameters deviate from posterior distributions

bayesianinferencemarkov-chain-montecarlomaximum likelihoodoptimization

I have a likelihood function $\mathcal{L}(d | \theta)$ for the probability of my data $d$ given some model parameters $\theta \in \mathbf{R}^N$, which I would like to estimate. Assuming flat priors on the parameters, the likelihood is proportional to the posterior probability. I use an MCMC method to sample this probability.

Looking at the resulting converged chain, I find that the maximum likelihood parameters are not consistent with the posterior distributions. For example, the marginalized posterior probability distribution for one of the parameters might be $\theta_0 \sim N(\mu=0, \sigma^2=1)$, while the value of $\theta_0$ at the maximum likelihood point is $\theta_0^{ML} \approx 4$, essentially being almost the maximum value of $\theta_0$ traversed by the MCMC sampler.

This is an illustrative example, not my actual results. The real distributions are far more complicated, but some of the ML parameters have similarly unlikely p-values in their respective posterior distributions. Note that some of my parameters are bounded (e.g. $0 \leq \theta_1 \leq 1$); within the bounds, the priors are always uniform.

My questions are:

  1. Is such a deviation a problem per se? Obviously I do not expect the ML parameters to exactly coincide which the maxima of each of their marginalized posterior distributions, but intuitively it feels like they should also not be found deep in the tails. Does this deviation automatically invalidate my results?

  2. Whether this is necessarily problematic or not, could it be symptomatic of specific pathologies at some stage of the data analysis? For example, is it possible to make any general statement about whether such a deviation could be induced by an improperly converged chain, an incorrect model, or excessively tight bounds on the parameters?

Best Answer

With flat priors, the posterior is identical to the likelihood up to a constant. Thus

  1. MLE (estimated with an optimizer) should be identical to the MAP (maximum a posteriori value = multivariate mode of the posterior, estimated with MCMC). If you don't get the same value, you have a problem with your sampler or optimiser.

  2. For complex models, it is very common that the marginal modes are different from the MAP. This happens, for example, if correlations between parameters are nonlinear. This is perfectly fine, but marginal modes should therefore not be interpreted as the points of highest posterior density, and not be compared to the MLE.

  3. In your specific case, however, I suspect that the posterior runs against the prior boundary. In this case, the posterior will be strongly asymmetric, and it doesn't make sense to interpret it in terms of mean, sd. There is no principle problem with this situation, but in practice it often hints towards model misspecification, or poorly chosen priors.

Related Question