Bayesian Ridge Regression – Is It Another Name for Bayesian Linear Regression?

bayesianregression

I searched about Bayesian Ridge Regression on Internet but most of the result I got is about Bayesian Linear Regression. I wonder if it's both the same things because the formula look quite similar

Best Answer

Ridge regression uses regularization with $L_2$ norm, while Bayesian regression, is a regression model defined in probabilistic terms, with explicit priors on the parameters. The choice of priors can have the regularizing effect, e.g. using Laplace priors for coefficients is equivalent to $L_1$ regularization. They are not the same, because ridge regression is a kind of regression model, and Bayesian approach is a general way of defining and estimating statistical models that can be applied to different models.

Ridge regression model is defined as

$$ \underset{\beta}{\operatorname{arg\,min}}\; \|y - X\beta\|^2_2 + \lambda \|\beta\|^2_2 $$

In Bayesian setting, we estimate the posterior distribution by using Bayes theorem

$$ p(\theta|X) \propto p(X|\theta)\,p(\theta) $$

Ridge regression means assuming Normal likelihood and Normal prior for the parameters. After droping the normalizing constant, the log-density function of normal distribution is

$$\begin{align} \log p(x|\mu,\sigma) &= \log\Big[\frac{1}{\sigma \sqrt{2\pi} } e^{-\frac{1}{2}\left(\frac{x-\mu}{\sigma}\right)^2}\Big] \\ &= \log\Big[\frac{1}{\sigma \sqrt{2\pi} }\Big] + \log\Big[e^{-\frac{1}{2}\left(\frac{x-\mu}{\sigma}\right)^2}\Big] \\ &\propto -\frac{1}{2}\left(\frac{x-\mu}{\sigma}\right)^2 \\ &\propto -\frac{1}{\sigma^2} \|x - \mu\|^2_2 \end{align}$$

Now you can see that maximizing normal log-likelihood, with normal priors is equivalent to minimizing the squared loss, with ridge penalty

$$\begin{align} \underset{\beta}{\operatorname{arg\,max}}& \; \log\mathcal{N}(y|X\beta, \sigma) + \log\mathcal{N}(0, \tau) \\ = \underset{\beta}{\operatorname{arg\,min}}&\; -\Big\{\log\mathcal{N}(y|X\beta, \sigma) + \log\mathcal{N}(0, \tau)\Big\} \\ = \underset{\beta}{\operatorname{arg\,min}}&\; \frac{1}{\sigma^2}\|y - X\beta\|^2_2 + \frac{1}{\tau^2} \|\beta\|^2_2 \end{align}$$

For reading more on ridge regression and regularization see the threads: Why does ridge estimate become better than OLS by adding a constant to the diagonal?, and What problem do shrinkage methods solve?, and When should I use lasso vs ridge?, and Why is ridge regression called "ridge", why is it needed, and what happens when $\lambda$ goes to infinity?, and many others we have.

Related Question