Regression – Why is Lasso Penalty Equivalent to the Double Exponential (Laplace) Prior?

bayesianlassopriorregressionregularization

I have read in a number of references that the Lasso estimate for the regression parameter vector $B$ is equivalent to the posterior mode of $B$ in which the prior distribution for each $B_i$ is a double exponential distribution (also known as Laplace distribution).

I have been trying to prove this, can someone flesh out the details?

Best Answer

For simplicity let's just consider a single observation of a variable $Y$ such that $$Y|\mu, \sigma^2 \sim N(\mu, \sigma^2),$$

$\mu \sim \mbox{Laplace}(\lambda)$ and the improper prior $f(\sigma) \propto \mathbb{1}_{\sigma>0}$.

Then the joint density of $Y, \mu, \sigma^2$ is proportional to $$ f(Y, \mu, \sigma^2 | \lambda) \propto \frac{1}{\sigma}\exp \left(-\frac{(y-\mu)^2}{\sigma^2} \right) \times 2\lambda e^{-\lambda \vert \mu \vert}. $$

Taking a log and discarding terms that do not involve $\mu$, $$ \log f(Y, \mu, \sigma^2) = -\frac{1}{\sigma^2} \Vert y-\mu\Vert_2^2 -\lambda \vert \mu \vert. \quad (1)$$

Thus the maximum of (1) will be a MAP estimate and is indeed the Lasso problem after we reparametrize $\tilde \lambda = \lambda \sigma^2$.

The extension to regression is clear--replace $\mu$ with $X\beta$ in the Normal likelihood, and set the prior on $\beta$ to be a sequence of independent laplace$(\lambda)$ distributions.

Best Answer

Related Solutions

Solved – Laplace smoothing and Dirichlet prior

Solved – Why is Gamma(0,0) equivalent to the Jeffreys prior

Related Question