Solved – Bayesian modeling using multivariate normal with covariate

bayesianconditional probabilitygibbsprobability

Suppose you have an explanatory variable ${\bf{X}} = \left(X(s_{1}),\ldots,X(s_{n})\right)$ where $s$ represents a given coordinate. You also have a response variable ${\bf{Y}} = \left(Y(s_{1}),\ldots,Y(s_{n})\right)$. Now, we can combine both variables as:

$${\bf{W}}({\bf{s}}) = \left( \begin{array}{ccc}X(s) \\ Y(s) \end{array} \right) \sim N(\boldsymbol{\mu}(s), T)$$

In this case, we simply choose $\boldsymbol{\mu}(s) = \left( \mu_{1} \; \; \mu_{2}\right)^{T}$ and $T$ is a covariance matrix that describes the relation between $X$ and $Y$. This only describes the value of $X$ and $Y$ at $s$. Since we have more points from other locations for $X$ and $Y$, we can
describe more values of ${\bf{W}}(s)$ in the following way:

$$\left( \begin{array}{ccc} {\bf{X}} \\ {\bf{Y}} \end{array}\right) = N\left(\left(\begin{array}{ccc}\mu_{1}\boldsymbol{1}\\ \mu_{2}\boldsymbol{1}\end{array}\right), T\otimes H(\phi)\right)$$

You will notice that we rearranged the components of $\bf{X}$ and $\bf{Y}$ to get all $X(s_i)$ in a column and after that, concatenate all $Y(s_i)$ together. Each component $H(\phi)_{ij}$ is a correlation function $\rho(s_i, s_j)$ and $T$ is as above. The reason we have the covariance $T\otimes H(\phi)$ is because we assume it is possible to separate the covariance matrix as $C(s, s')=\rho(s, s') T$.

Question 1: When I calculate the conditional ${\bf{Y}}\mid{\bf{X}}$, what I'm actually doing is generating a set of values of $\bf{Y}$ based on $\bf{X}$, correct? I already have $\bf{Y}$ so I would be more interested in predicting a new point $y(s_{0})$. In this case, I should have a matrix $H^{*}(\phi)$ defined as

$$H^{*}(\phi) = \left(\begin{array}{ccc}H(\phi) & \boldsymbol{h} \\ \boldsymbol{h}& \rho(0,\phi) \end{array}\right)$$

in which $\boldsymbol{h}(\phi)$ is a vector $\rho(s_{0} – s_{j};\phi)$. Therefore, we can construct a vector (without rearrangement):

$${\bf{W^{*}}} = \left({\bf{W}}(s_{1}), \ldots, {\bf{W}}(s_{n}), {\bf{W}}(s_{0})\right)^{T} \sim N\left(\begin{array}{ccc}\boldsymbol{1}_{n+1} \otimes \left( \begin{array}{ccc} \mu_{1} \\ \mu_{2} \end{array} \right)\end{array}, H(\phi)^{*}\otimes T\right)$$

And now I just rearrange to get a joint distribution $\left(\begin{array}{ccc} {\bf{X}} \\ x(s_0) \\{\bf{Y}} \\ y(s_0)\end{array} \right)$ and obtain the conditional $p(y(s_0)\mid x_0, {\bf{X}}, {\bf{Y}})$.

Is this correct?

Question 2: For predicting, the paper I'm reading indicates that I must use this conditional distribution $p(y(s_0)\mid x_0, {\bf{X}}, {\bf{Y}})$ and obtain a posterior distribution $p(\mu, T, \phi\mid x(s_0), {\bf{Y}}, {\bf{X}})$, but I'm not sure how to obtain the posterior distribution for the parameters. Maybe I could use the distribution $\left(\begin{array}{ccc}{\bf{X}} \\ x(s_0)\\ {\bf{Y}}\end{array}\right)$ that I think is exactly the same as $p({\bf{X}}, x(s_0), {\bf{Y}}\mid\mu, T, \phi)$ and then simply use Bayes' theorem to obtain $p(\mu, T, \phi\mid {\bf{X}}, x(s_0), {\bf{Y}}) \propto p({\bf{X}}, x(s_0), {\bf{Y}}\mid\mu, T, \phi)p(\mu, T, \phi)$

Question 3: At the end of the subchapter, the author says this:

For prediction, we do not have ${\bf{X}}(s_0)$. This does not create any new
problems as it may be treated as a latent variable and incorporated
into $\bf{x}'$ This only results in an additional draw within each
Gibbs iteration and is a trivial addition to the computational task.

What does that paragraph mean?

By the way, this procedure can be found in this paper (page 8), but as you can see, I need a bit more of detail.

Thanks!

Best Answer

Question 1: Given your joint probability model $$\left( \begin{array}{ccc} {\bf{X}} \\ {\bf{Y}} \end{array}\right) \sim N\left(\left(\begin{array}{ccc}\mu_{1}\boldsymbol{1}\\ \mu_{2}\boldsymbol{1}\end{array}\right), \begin{bmatrix} \boldsymbol\Sigma_{11} & \boldsymbol\Sigma_{12} \\ \boldsymbol\Sigma_{21} & \boldsymbol\Sigma_{22} \end{bmatrix} \right)=N\left(\left(\begin{array}{ccc}\mu_{1}\boldsymbol{1}\\ \mu_{2}\boldsymbol{1}\end{array}\right), T\otimes H(\phi)\right)$$ the conditional distribution of $\bf{Y}$ given $\bf{X}$ is also Normal, with mean $$\boldsymbol\mu_2 + \boldsymbol\Sigma_{21} \boldsymbol\Sigma_{11}^{-1} \left( \mathbf{X} - \boldsymbol\mu_1\right)$$ and variance-covariance matrix $$\boldsymbol\Sigma_{22} - \boldsymbol\Sigma_{21} \boldsymbol\Sigma_{11}^{-1} \boldsymbol\Sigma_{21}.$$ (Those formulas are copied verbatim from the Wikipedia page on multivariate normals.) The same applies to $p(y(s_0)\mid x(s_0), {\bf{X}}, {\bf{Y}})$ since $(y(s_0), x(s_0), {\bf{X}}, {\bf{Y}})$ is another Normal vector.


Question 2: The predictive $p(y(s_0)\mid x(s_0), {\bf{X}}, {\bf{Y}})$ is defined as $$ p(y(s_0) | x(s_0), {\bf{X}}, {\bf{Y}})=\int p(y(s_0)| x(s_0), {\bf{X}}, {\bf{Y}},\mu,T,\phi)\,p(\mu,T,\phi| x(s_0), {\bf{X}}, {\bf{Y}})\,\text{d}\mu\,\text{d} T\,\text{d}\phi\,, $$ i.e., by integrating out the parameters using the posterior distribution of those posteriors, given the current data $({\bf{X}}, {\bf{Y}},x(s_0))$. So there is a little bit more to the full answer. Obviously, if you only need to simulate from the predictive, your notion of simulating jointly from $p(\mu, T, \phi\mid {\bf{X}}, x(s_0), {\bf{Y}})$ and then from $p(y(s_0)\mid x(s_0), {\bf{X}}, {\bf{Y}},\mu,T,\phi)$ is valid.


Question 3: In the event that $x(s_0)$ is not observed, the pair $(x(s_0),y(s_0))$ can be predicted from another predictive $$ p(x(s_0),y(s_0)\mid {\bf{X}}, {\bf{Y}})=\int p(x(s_0),y(s_0)\mid {\bf{X}}, {\bf{Y}},\mu,T,\phi)\,p(\mu,T,\phi\mid {\bf{X}}, {\bf{Y}})\,\text{d}\mu\,\text{d} T\,\text{d}\phi\,. $$

When simulating from this predictive, because it is not available in a manageable form, a Gibbs sampler can be run that iteratively simulates

  1. $\mu\mid {\bf{X}}, {\bf{Y}},x(s_0),y(s_0),T,\phi$
  2. $T\mid {\bf{X}}, {\bf{Y}},x(s_0),y(s_0),\mu,\phi$
  3. $\phi\mid {\bf{X}}, {\bf{Y}},x(s_0),y(s_0),T,\mu$
  4. $x(s_0)\mid {\bf{X}}, {\bf{Y}},y(s_0),\phi,T,\mu$
  5. $y(s_0)\mid {\bf{X}}, {\bf{Y}},x(s_0),\phi,T,\mu$

or else merge steps 4 and 5 into a single step

  • $x(s_0),y(s_0)\mid {\bf{X}}, {\bf{Y}},\phi,T,\mu$
Related Question