Solved – Posterior of the linear regression model with g-Prior

bayesianlinearposteriorpriorregression

Assume a linear model of the form $$Y=X\beta+\epsilon$$ where $\epsilon$ has a multivariate Normal distribution with mean $0_N$ and covariance matrix $\sigma^2 I_N$. I would like to perform simple of Bayesian inference and using a natural conjugate g-prior for $\beta$ conditional on $h=\sigma^{-2}$:
$$\beta|h \sim N(\underline{\beta},h^{-1}g(X'X)^{-1})\text{ , where } g>0$$
and
$$h \sim G(\underline{s}^{-2},\underline{v}).$$
From standard textbooks (e.g. those publicly available slides from Luc Bauwens, p. 115) it is clear that the marginal posterior distribution of $\beta$ is an $N$-variate $t$-distribution with parameters (given T observations and letting $\underline{v},\underline{s}\rightarrow0$):
$$\beta|y \sim t(\overline{\beta},\overline{s^2}\overline{V},T)$$
where
$$\overline{\beta}=\frac{1}{1+g}\underline{\beta}+\frac{g}{1+g}{\beta}_\text{OLS}$$ and
$$\overline{V}=\frac{g}{1+g}(X'X)^{-1}.$$
However, I do fail in finding an appropriate representation of $\overline{s^2}$, given $\underline{\beta}\neq0$. The representation should be
$$\overline{s^2}=\frac{1}{T}\left( (Y-X\beta_\text{OLS})'(Y-X\beta_\text{OLS})+\frac{1}{g}(\beta_\text{OLS}-\underline{\beta})'X'X(\beta_\text{OLS}-\underline{\beta})\right)$$. Can anyone give me a more comprehensive representation of $\overline{s^2}$ that helps me to understand the impact of the chosen prior mean $\underline{\beta}$ onto the posterior variance? I appreciate every comment or idea, thank you=)

Best Answer

[Extract from our book Bayesian Core, Chapter 3, pp.54-56]

3.2.1 Conjugate Priors

Specifying both the conditional prior on $\beta$, $$ \beta|\sigma^2,X\sim\mathscr{N}_{k+1}(\tilde\beta,\sigma^2M^{-1})\,, $$ where $M$ is a $(k+1,k+1)$ positive definite symmetric matrix, and the marginal prior on $\sigma^2$, $$ \sigma^2|X\sim \mathscr{IG}(a,b),\qquad a,b>0, $$ we indeed have conjugate priors in that the conditional posterior distribution $\pi(\beta|\sigma^2,y,X)$ is $$ \beta|\sigma^2,y,X\sim \mathscr{N}_{k+1}\left((M+X^\text{T} X)^{-1} \{(X^\text{T} X)\hat\beta+M\tilde\beta\},\sigma^2(M+X^\text{T} X)^{-1}\right), $$ where $\hat\beta$ is the OLS, and the marginal posterior distribution $\pi(\sigma^2|y,X)$ is $$ \sigma^2|y,X\sim \mathscr{IG}\left(\frac{n}{2}+a,b+\frac{s^2}{2}+\frac{(\tilde\beta-\hat\beta)^\text{T} \left(M^{-1}+(X^\text{T} X)^{-1}\right)^{-1}(\tilde\beta-\hat\beta)}{2}\right)\,, $$ where $s^2=(y-X\hat\beta)^\text{T} (y-X\hat\beta)$, posteriors which are of the same types as the prior distributions.

Integrating the conditional posterior of $\beta$ over the marginal posterior of $\sigma^2$ [in $\sigma^2$] leads to a multivariate $t$ marginal posterior distribution on $\beta$ since \begin{align*} \pi(\beta|y,X) & \propto \left[(\beta-\{M+X^\text{T}X\}^{-1}[\{X^\text{T} X\}\hat\beta+M\tilde\beta])^\text{T} (M+X^\text{T} X)\right. \\ &\quad \times(\beta-\{M+X^\text{T} X\}^{-1}[\{X^\text{T} X\}\hat\beta+M\tilde\beta])+2b+s^2 \\ &\quad +\left.(\tilde\beta-\hat\beta)^\text{T} \left(M^{-1}+(X^\text{T} X)^{-1}\right)^{-1} (\tilde\beta-\hat\beta)\right]^{-(n/2+k/2+a)} \end{align*} (the computation is straightforward if tedious bookkeeping). We recall that the density of a multivariate $\mathscr{T}_p(\nu,\theta,\Sigma)$ distribution on $\mathbb{R}^p$ is $$ f(t|\nu,\theta,\Sigma)=\frac{\Gamma((\nu+p)/2)/\Gamma(\nu/2)}{\sqrt{\mbox{det}(\Sigma)\nu\pi}}\left[1+\frac{(t-\theta)^\text{T} \Sigma^{-1}(t-\theta)}{\nu}\right]^{-(\nu+p)/2}\,. $$ We thus have that, marginally and a posteriori, $$ \beta|y,X \sim \mathscr{T}_{k+1}\left(n+2a,\hat\mu,\hat\Sigma\right), $$ with \begin{align*} \hat\mu &= (M+X^\text{T} X)^{-1}((X^\text{T} X)\hat\beta+M\tilde\beta),\\ \hat\Sigma &= \frac{2b+s^2+(\tilde\beta-\hat\beta)^\text{T} \left(M^{-1}+(X^\text{T} X)^{-1}\right)^{-1} (\tilde\beta-\hat\beta)}{n+2a}(M+X^\text{T} X)^{-1}. \end{align*} In this case, the posterior variance of $\beta$ is proportional to $\left(M+X^\text{T} X\right)^{-1}$. The correlation structure is thus completely determined by the prior and the design matrix. The scale factor comes from the Inverse Gamma part: Modulo an $(n+2a) /(n+2a-4)$ term, this is the expectation of $\sigma^2$ from its marginal posterior.

Related Question