If you assume a uniform prior on $\beta$, then the posterior for $\beta$ is
$$ \beta|y \sim N(\hat{\beta},V_\beta). $$
with
$$\hat{\beta} = [X'\Sigma^{-1}X]X'y \quad \mbox{and} \quad V_\beta =[X'\Sigma^{-1}X]^{-1}.$$
To find the predictive distribution, we need more information. If $\tilde{y} \sim N(\tilde{X}\beta,\tilde{\Sigma})$ and is conditionally independent of $y$ given $\beta$, then
$$ \tilde{y}|y \sim N(\tilde{X}\hat{\beta}, \tilde{\Sigma} + V_\beta).$$
But typically for these types of models, $y$ and $\tilde{y}$ are not conditionally independent, instead, we typically have
$$ \left( \begin{array}{c}
y \\ \tilde{y}
\end{array} \right) \sim N\left( \left[\begin{array}{c} X\beta \\ \tilde{X}\beta \end{array}\right],
\left[ \begin{array}{cc}
\Sigma & \Sigma_{12} \\
\Sigma_{21} & \tilde{\Sigma}
\end{array} \right]\right).
$$
If this is the case, then
$$
\tilde{y}|y \sim N(\tilde{X}\hat{\beta} + \Sigma_{21}\Sigma^{-1}(y-X\hat{\beta}),
\tilde{\Sigma} - \Sigma_{21}\Sigma^{-1}\Sigma_{12}).
$$
This assumes $\Sigma, \Sigma_{12},$ and $\tilde{\Sigma}$ are all known. As you point out, typically they are unknown and need to be estimated. For the common models that have this structure, e.g. time series and spatial models, there generally won't be a closed form for the predictive distribution.
...the joint posterior of $\mu$ and $\sigma^2$ given the data is
proportional to the product of both $$\displaystyle
(\sigma^2)^{-\frac{v_o+1}{2}-1}e^{-\frac{1}{2\sigma^2}[v_os_o^2+n_o(\mu-\mu_o)^2]} (\sigma^2)^{-\frac{n}{2}}e^{-\frac{1}{2\sigma^2}[(n-1)s^2+n(\overline
x-\mu)^2]}$$
which simplifies into
$$\displaystyle
\pi(\mu,\sigma^2\mid\overline{x},s)\propto(\sigma^2)^{-\frac{v_o+n+1}{2}-1}e^{-\frac{1}{2\sigma^2}[v_os_o^2+n_o(\mu-\mu_o)^2+(n-1)s^2+n(\overline
x-\mu)^2]}\,.\tag{1}$$
As a function of $\sigma^2$ only (which means conditional on everything else) this expression is proportional to an inverse Gamma conditional posterior density, with parameters $$\frac{1}{2}(v_o+n+1,v_os_o^2+n_o(\mu-\mu_o)^2+(n-1)s^2+n(\overline
x-\mu)^2).$$ Proportional means that the above expression is missing a multiplicative constant to turn it a probability density, with integral equal to one, a constant that is called a normalisation term or a normalising constant. The full inverse Gamma density is indeed
\begin{align}\Gamma(\frac{v_o+n+1}{2})^{-1}&\times\left\{\frac{[v_os_o^2+n_o(\mu-\mu_o)^2+(n-1)s^2+n(\overline
x-\mu)^2]}{2}\right\}^{\frac{v_o+n+1}{2}}\\
&(\sigma^2)^{-\frac{v_o+n+1}{2}-1}e^{-\frac{1}{2\sigma^2}[v_os_o^2+n_o(\mu-\mu_o)^2+(n-1)s^2+n(\overline{x}-\mu)^2]}
\end{align}
Integrating the rhs of (1) with respect to $\sigma^2$ thus produces the missing constant,
$$\Gamma(\frac{v_o+n+1}{2})\times\left\{\frac{[v_os_o^2+n_o(\mu-\mu_o)^2+(n-1)s^2+n(\overline
x-\mu)^2]}{2}\right\}^{-\frac{v_o+n+1}{2}}\tag{2}$$
and since integrating the joint posterior on $(\mu,\sigma^2)$ in $\sigma^2$ returns the marginal posterior density of $\mu$,
$$\pi(\mu\mid\overline{x},s) = \int_0^\infty \pi(\mu,\sigma^2\mid\overline{x},s)\,\text{d}\sigma^2$$
this integral (2) is therefore the marginal posterior density of $\mu$, up to a multiplicative constant. With a wee bit of further work, (2) turns into a Student's $t$ density on $\mu$, with $\frac{v_o+n}{2}$ degrees of freedom, as also described in our book, Bayesian Essentials [pages 30-31].
Conversely, if integrating $\mu$ first from (1), the expression becomes proportional to $\pi(\sigma^2\mid\overline{x},s)$: since
\begin{align}\pi(\mu,\sigma^2\mid\overline{x},s) &\propto
(\sigma^2)^{-\frac{v_o+n+1}{2}-1}e^{-\frac{1}{2\sigma^2}[v_os_o^2+n_o(\mu-\mu_o)^2+(n-1)s^2+n(\overline
x-\mu)^2]}\\
&=(\sigma^2)^{-\frac{v_o+n}{2}-1}e^{-\frac{1}{2\sigma^2}[v_os_o^2+(n-1)s^2+n_o\mu_o^2+n\overline{x}^2]}\overbrace{\sigma^{-1}e^{-\frac{1}{2\sigma^2}[(n_o+n)\mu^2 -2\mu(n_o\mu_o+n\overline x)]}}^\text{incomplete normal density}\\
&=(\sigma^2)^{-\frac{v_o+n}{2}-1}e^{-\frac{1}{2\sigma^2}[v_os_o^2+(n-1)s^2+n_o\mu_o^2+n\overline{x}^2]}\underbrace{\sigma^{-1}e^{-\frac{n_o+n}{2\sigma^2}[\mu -(n_o\mu_o+n\overline x)/(n_o+n)]^2}}_\text{conditional posterior normal density in $\mu$}\\
&\qquad\qquad\times \underbrace{e^{\frac{1}{2\sigma^2}[(n_o\mu_o+n\overline x)^2/(n_o+n)]}}_\text{missing term in perfect square}\end{align}
integrating $\mu$ out returns
\begin{align}\pi(\sigma^2\mid\mathbf{x})
&\propto (\sigma^2)^{-\frac{v_o+n}{2}-1}e^{-\frac{1}{2\sigma^2}[v_os_o^2+(n-1)s^2+n_o\mu_o^2+n\overline{x}^2-(n_o\mu_o+n\overline x)^2/(n_o+n)]}\\
&=(\sigma^2)^{-\frac{v_o+n}{2}-1}e^{-\frac{1}{2\sigma^2}[v_os_o^2+(n-1)s^2+n_ov_o(\overline x-\mu_o)^2/(n_o+v_o)^2]}
\end{align}
since
\begin{align}n_o\mu_o^2+n\overline{x}^2-(n_o\mu_o+n\overline x)^2/(n_o+n)
&=n_o\mu_o^2+n\overline{x}^2-\frac{n_o^2}{n_o+n}\mu_o^2-\frac{2n_on}{n_o+n}\mu_o\overline x-\frac{n^2}{n_o+n}\overline{x}^2\\
&= \frac{n_o\mu_o^2}{n_o+n}(n_o+n-n_o)-\frac{2n_on\mu_o\overline x}{n_o+n}+\frac{n\overline{x}^2}{n_o+n}(n_o+n-n)\\
&=\frac{nn_o}{n_o+n}(\mu_o-\overline x)^2
\end{align}
which means that the marginal posterior on $\sigma^2$ is an inverse Gamma with parameters
$$\frac{1}{2}(v_o+n,[v_os_o^2+(n-1)s^2+n_on(\overline x-\mu_o)^2/(n_o+n)])$$
This also follows from observing that, since
$$\overline{x}\mid\mu,\sigma\sim\mathcal{N}(\mu,n^{-1}\sigma^2)\qquad
\mu\mid\sigma\sim\mathcal{N}(\mu_o,n_o^{-1}\sigma^2)$$
then by marginalising in $\mu$
$$\overline{x}\mid\sigma\sim\mathcal{N}(\mu_o,[n^{-1}+n_o^{-1}]\sigma^2)$$
Best Answer
[Extract from our book Bayesian Core, Chapter 3, pp.54-56]
3.2.1 Conjugate Priors
Specifying both the conditional prior on $\beta$, $$ \beta|\sigma^2,X\sim\mathscr{N}_{k+1}(\tilde\beta,\sigma^2M^{-1})\,, $$ where $M$ is a $(k+1,k+1)$ positive definite symmetric matrix, and the marginal prior on $\sigma^2$, $$ \sigma^2|X\sim \mathscr{IG}(a,b),\qquad a,b>0, $$ we indeed have conjugate priors in that the conditional posterior distribution $\pi(\beta|\sigma^2,y,X)$ is $$ \beta|\sigma^2,y,X\sim \mathscr{N}_{k+1}\left((M+X^\text{T} X)^{-1} \{(X^\text{T} X)\hat\beta+M\tilde\beta\},\sigma^2(M+X^\text{T} X)^{-1}\right), $$ where $\hat\beta$ is the OLS, and the marginal posterior distribution $\pi(\sigma^2|y,X)$ is $$ \sigma^2|y,X\sim \mathscr{IG}\left(\frac{n}{2}+a,b+\frac{s^2}{2}+\frac{(\tilde\beta-\hat\beta)^\text{T} \left(M^{-1}+(X^\text{T} X)^{-1}\right)^{-1}(\tilde\beta-\hat\beta)}{2}\right)\,, $$ where $s^2=(y-X\hat\beta)^\text{T} (y-X\hat\beta)$, posteriors which are of the same types as the prior distributions.
Integrating the conditional posterior of $\beta$ over the marginal posterior of $\sigma^2$ [in $\sigma^2$] leads to a multivariate $t$ marginal posterior distribution on $\beta$ since \begin{align*} \pi(\beta|y,X) & \propto \left[(\beta-\{M+X^\text{T}X\}^{-1}[\{X^\text{T} X\}\hat\beta+M\tilde\beta])^\text{T} (M+X^\text{T} X)\right. \\ &\quad \times(\beta-\{M+X^\text{T} X\}^{-1}[\{X^\text{T} X\}\hat\beta+M\tilde\beta])+2b+s^2 \\ &\quad +\left.(\tilde\beta-\hat\beta)^\text{T} \left(M^{-1}+(X^\text{T} X)^{-1}\right)^{-1} (\tilde\beta-\hat\beta)\right]^{-(n/2+k/2+a)} \end{align*} (the computation is straightforward if tedious bookkeeping). We recall that the density of a multivariate $\mathscr{T}_p(\nu,\theta,\Sigma)$ distribution on $\mathbb{R}^p$ is $$ f(t|\nu,\theta,\Sigma)=\frac{\Gamma((\nu+p)/2)/\Gamma(\nu/2)}{\sqrt{\mbox{det}(\Sigma)\nu\pi}}\left[1+\frac{(t-\theta)^\text{T} \Sigma^{-1}(t-\theta)}{\nu}\right]^{-(\nu+p)/2}\,. $$ We thus have that, marginally and a posteriori, $$ \beta|y,X \sim \mathscr{T}_{k+1}\left(n+2a,\hat\mu,\hat\Sigma\right), $$ with \begin{align*} \hat\mu &= (M+X^\text{T} X)^{-1}((X^\text{T} X)\hat\beta+M\tilde\beta),\\ \hat\Sigma &= \frac{2b+s^2+(\tilde\beta-\hat\beta)^\text{T} \left(M^{-1}+(X^\text{T} X)^{-1}\right)^{-1} (\tilde\beta-\hat\beta)}{n+2a}(M+X^\text{T} X)^{-1}. \end{align*} In this case, the posterior variance of $\beta$ is proportional to $\left(M+X^\text{T} X\right)^{-1}$. The correlation structure is thus completely determined by the prior and the design matrix. The scale factor comes from the Inverse Gamma part: Modulo an $(n+2a) /(n+2a-4)$ term, this is the expectation of $\sigma^2$ from its marginal posterior.