Deriving The Posterior For Bayesian Linear Regression

bayesianlinear regression

In Bayesian linear regression, suppose we have the likelihood function

$$p(t| X, w, \beta) = \prod_{i=1}^N \mathcal{N}(t_i| w^T \phi(x_i), \beta^{-1})$$

where $x$ is the input, $t$ is the response vector.

Define a conjugate prior distribution as

$$p(w) = \mathcal{N}(0, S^{-1})$$

where $S = \alpha I$. We assume for now that $\alpha$ and $\beta$ are known.

Now the posterior can be shown to be solved analytically, parameterized by $p(w|x,t,\beta) = \mathcal{N}(m_n, S_n)$ where

$$m_n = \beta S_n \phi(x)^Tt$$

$$S_n = \alpha I + \beta \phi(x)^T \phi(x)$$

I believe one can derive $m_n$ and $S_n$ from the log likelihood function, but I cannot figure how to do this.

Thanks in advance!

References:
https://cedar.buffalo.edu/~srihari/CSE574/Chap3/3.4-BayesianRegression.pdf
http://krasserm.github.io/2019/02/23/bayesian-linear-regression/

Best Answer

The form of $p(t|X,w,\beta)$ is multivariate normal with mean vector $m_0=\phi(X)w$ and precision matrix $S_0=\beta I$. Here $\phi(X)$ is the matrix whose $i$th row is $\phi(x_i)$.

By Bayes' rule, we have $p(w|X,t,\beta)\propto p(t|X,w,\beta)p(w)$, where the $\propto$ indicates that the two expressions are equal modulo a multiplicative constant that does not depend on $w$.

Now, the right hand side is equal to $e^{-(t-\mu_0)^tC_0(t-\mu_0)/2}e^{-w^tSw/2}=e^{-\beta(t-\mu_0)^t(t-\mu_0)/2-\alpha w^tw/2}$, modulo multiplicative constants that do not depend on $w$.

At this point, all that remains to do is massage the expression inside the exponential.

Expanding the quantity inside the exponential using the definition of $m_0$ (and multiplying through by $-1/2$) gives $\beta (t-\phi(X)w)^T(t-\phi(X)w)+\alpha w^tw=-\beta w^t\phi(X)^tt-\beta t^t\phi(X)w+\beta w^t\phi(X)^t\phi(X)w+\alpha w^tw$. We dropped the term $\beta t^tt$ because it does not depend on $w$, and will thus be absorbed into the normalizing constant when it is exponentiated.

Setting $S_n=\alpha I+\beta \phi(X)^t\phi(X)$ as in the question, we can rewrite this

$$-\beta w^t\phi(X)^tt-\beta t^t\phi(X)w+w^tS_nw$$

We want to "complete the square", so we rewrite $\beta t^t\phi(X)=m_n^t S_n$ where $m_n$ is defined implicitly by this equation. Then we have

$$-w^tS_nm_n-m_n^tS_nw+w^tS_nw=(w-m_n)^tS_n(w-m_n)-m_n^tS_nm_n$$

We have shown that $$p(w|X,w,\beta)\propto e^{-(w-m_n)^tS_n(w-m_n)/2+m_n^tS_nm_n/2}$$

However, the term $e^{m_n^tS_nm_n/2}$ does not depend on $w$, so it can be absorbed into the implicit normalizing constant. This gives the desired expression.

Best Answer

Related Solutions

[Math] Bayesian Interpretation for Ridge Regression and the Lasso

Bayesian Linear Regression

Related Question