Bayesian Estimation – Comprehensive Guide to Bayesian MAP Estimates

bayesianestimationinferencemap-estimation

Suppose you have a simple linear regression problem (y = bo + b1x) and you decide to use Bayesian Estimation to estimate the value pf bo and b1.

Using Bayesian Estimation, you obtain a list of different acceptable values for bo and b1. For instance, for b1 : you could make a histogram out of all these values.

But from here, how do you identify the "best" value of b1 from the histogram? Do you have to use Kernel Density Estimation and fit a continuous function over the histogram, and then take the mode of this continuous function (I am not sure how to do this)?

In the following link: https://en.wikipedia.org/wiki/Maximum_a_posteriori_estimation

The general formula for the MAP estimate of a parameter "thetha" is : Thetha-map = argmax (likelihood * prior).

I am a bit confused. I thought the goal of Bayesian Estimation would be to obtain the posterior distribution of thetha. Thus, why is the posterior distribution absent from the formula of the MAP estimate?

Thanks

Best Answer

The M in MAP stands for "Maximum", so it should be or no surprise that optimization is involved . In order to obtain estimates for the parameters, you compute

$$ \hat{\theta} = \underset{\beta \in \mathbb{R}^2}{\text{argmax}} \left\{ \text{Log Prior} + \text{Log Likelihood} \right\}$$

You can't do MAP with samples from the posterior (and I'm not sure why you would need to)

Let's take a look at an example:

Let's say my prior for my slope is a normal distribution with mean 0 and standard deviation 2

$$ \beta_1 \sim \mathcal{N}(0, 2^2) $$

and my prior for my intercept is a normal distirbution with mean 1 and standard deviation 1

$$ \beta_0 \sim \mathcal{N}(0, 1^2) $$

I'm going to generate some data and write out some functions to compute log prior, log likelihood, and log posterior.

Let's start with the log prior. If the priors for the parameters are independent (meaning we don't place priors on the joint distirbution of the slope/intercept) then the log prior is the sum of the log priors for each parameter.

In R...

log_prior = function(theta){
  
  intercept_log_prior = dnorm(theta[1], mean = 1, sd = 1, log=T)
  
  slope_log_prior = dnorm(theta[2], mean = 0, sd = 2, log=T)
  
  intercept_log_prior + slope_log_prior
  
}

Here, theta is a two element vector. The first is the intercept, the second is the slope. Now, we need the log likelihood. For linear regression, that is typically a gaussian

$$ y \sim \mathcal{N}(\beta_0 + \beta_1 x, \sigma) $$

Usually, we have to estimate $\sigma$ by placing a prior on it, but I'm going to pretend we know it already. Here is the log likelihood in R

log_lik = function(theta, x, y){
  
  sum(dnorm(y, mean = theta[1] + theta[2]*x, log = T))
  
}

Note how I am using x and theta to compute the conditional mean of $y$ for evaluation in the log likelihood.

Now, its just a straight forward optimization problem. Let's generate some data, define our objective function (log posterior) and optimize

# data
set.seed(0)
x=rnorm(100)
y = 2*x + 1 + rnorm(100)

# Objective function (Note the negative signs because we maximize the negative log posterior)

log_posterior = function(theta, x, y){
  -log_prior(theta) - log_lik(theta, x, y)
}

# Optimize!
theta_est = optim(c(0,0), log_posterior, x=x, y=y)$par

Here is the result from the MAP estimation procedure I've outlined

enter image description here

Related Question