Solved – Estimation of Bayesian Ridge Regression

bayesianregressionridge regression

According to scikit-learn, by using a probabilistic model :

$p(y|X,\omega,\alpha) = \mathcal{N}(y|X\omega,\alpha)$

with $\omega$ given by a spherical Gaussian:
$p(\omega|\lambda) = \mathcal{N}(\omega|0,\lambda^{-1}\mathbf{I_p})$

it is now a Bayesian model of ridge regression. So can i say that the estimation of this model on unknown data $X^*$ is a probability distribution on y with mean $\mu$ = $X\omega$ and variance $\sigma^2 = \alpha$ or $\sigma^2=\lambda^{-1}\mathbf{I_p}$ ? What exactly do $\alpha$ and $\lambda$ do in the equations ?

Best Answer

What the description in the sklearn documentation says is that the model is a regression model with extra regularization parameter for the coefficients. The model is

$$\begin{align} y &\sim \mathcal{N}(\mu, \alpha^{-1}) \\ \mu &= X\omega \\ \omega &\sim \mathcal{N}(0, \lambda^{-1}\mathbf{I}_p) \\ \alpha &\sim \mathcal{G}(\alpha_1, \alpha_2) \\ \lambda &\sim \mathcal{G}(\lambda_1, \lambda_2) \end{align}$$

So $y$ follows normal distribution (the likelihood function) parametrized by mean $\mu = X\omega$ and variance $\alpha^{-1}$. Where we choose Gamma priors for $\alpha$ and regularizing parameter $\lambda$, the distributions have hyperpriors $\alpha_1, \alpha_2, \lambda_1, \lambda_2$. The regression parameters $\omega$ have independent Gaussian priors with mean $0$ and variance $\lambda^{-1}$, so $\lambda$ serves as a regularization parameter (it is a precision parameter, so the larger $\lambda$, the $\omega$ values are a priori assumed to be more concentrated around zero).

Related Question