Solved – Is the MAP the maximum value of the posterior or its mode

bayesianmode

In Bayesian statistics, a maximum a posteriori probability (MAP)
estimate is an estimate of an unknown quantity, that equals the mode
of the posterior distribution.

(emphasis added) which, given a posterior $f(\theta \mid x)$, can be defined as:

$$\hat{\theta}_{\mathrm{MAP}}(x)
= \underset{\theta}{\operatorname{arg\,max}} \ f(\theta \mid x)
$$

As far as I understand it, the mode of a distribution depends on how I construct its histogram (or KDE). This looks to me to be in contradiction with the above definition, where the MAP is the $\theta$ value found for the maximum of the sampled $f(\theta \mid x)$ and does not depend on anything else.

What am I missing?

Best Answer

In Bayesian statistics, a maximum a posteriori probability (MAP) estimate is an estimate of an unknown quantity, that equals the mode of the posterior distribution.

This is correct.

As far as I understand it, the mode of a distribution depends on how I construct its histogram (or KDE). This looks to me to be in contradiction with the above definition, where the MAP is the maximum θ value found for the sampled f(θ∣x) and does not depend on anything else.

You are confusing theoretical quantities with random/sampling based ones. The posterior is defined by the likelihood and prior. That is $f(θ∣x) = \frac{L(\theta)\pi(\theta)}{\int_{\Theta} L(\theta')\pi(\theta') d \theta' }$ where $L(\theta)$, $\pi(\theta)$ are the likelihood and prior, respectively. This is a unique function in $\theta$.

In practice it is common to sample from this distribution, and if that's the case, then you're introducing random error. Usually when this is done, it is the only option because, say, the normalizing constant will not be available. Popular strategies obtain samples from $f(\theta|x)$ and only require that the user is able to evaluate the un-normalized posterior $L(\theta)\pi(\theta)$. Even though this is only an approximation because it is indeed based on random draws, there are results that guarantee the convergence of your estimators, so the error is tolerable as long as you run your simulations for long enough.

Best Answer

Related Solutions

Solved – Posterior predictive distribution vs MAP estimate

Bayesian Estimation – Example of a Transformation on Posterior Distribution Affecting MAP Estimate

Related Question