Solved – Understanding Bayesian Predictive Distributions

bayesianprediction

I'm taking an Intro to Bayes course and I'm having some difficulty understanding predictive distributions. I understand why they are useful and I'm familiar with the definition, but there are some things I don't quite understand.

1) How to get the right predictive distribution for a vector of new observations

Suppose that we have built a sampling model $p(y_i | \theta)$ for the data and a prior $p(\theta)$. Assume that the observations $y_i$ are conditionally independent given $\theta$.

We have observed some data $\mathcal{D} = \{y_1, y_2, \, … \, , y_k\}$, and we update our prior $p(\theta)$ to the posterior $p(\theta | \mathcal{D})$.

If we wanted to predict a vector of new observations $\mathcal{N} = \{\tilde{y}_1, \tilde{y}_2, \, … \, , \tilde{y}_n\}$, I think we should try to get the posterior predictive using this formula
$$
p(\mathcal{N} | \mathcal{D}) = \int p(\theta | \mathcal{D}) p ( \mathcal{N} | \theta) \, \mathrm{d} \theta = \int p(\theta | \mathcal{D}) \prod_{i=1}^n p(\tilde{y}_i | \theta) \, \mathrm{d} \theta,
$$
which is not equal to
$$
\prod_{i=1}^n \int p(\theta | \mathcal{D}) p(\tilde{y}_i | \theta) \, \mathrm{d} \theta,
$$
so the predicted observations are not independent, right?

Say that $\theta | \mathcal{D} \sim$ Beta($a,b$) and $p(y_i | \theta) \sim$ Binomial($n, \theta$) for a fixed $n$. In this case, if I wanted to simulate 6 new $\tilde{y}$, if I understand this correctly, it would be wrong to simulate 6 draws independently from the Beta-Binomial distribution that corresponds to the posterior predictive for a single observation. Is this correct? I don't know how to interpret that the observations are not independent marginally, and I'm not sure I understand this correctly.

Simulating from posterior predictives

Many times when we simulate data from the posterior predictive we follow this scheme:

For $b$ from 1 to $B$:

1) Sample $\theta^{(b)}$ from $p(\theta | \mathcal{D})$.

2) Then simulate new data $\mathcal{N}^{(b)}$ from $p(\mathcal{N} | \theta^{(b)})$.

I don't quite know how to prove this scheme works, although it looks intuitive. Also, does this have a name? I tried to look up a justification and I tried different names, but I had no luck.

Thanks!

Best Answer

Suppose that $X_1,\dots,X_n,X_{n+1}$ are conditionally independent given that $\Theta=\theta$. Then, $$ f_{X_{n+1}\mid X_1,\dots,X_n}(x_{n+1}\mid x_1,\dots,x_n) = \int f_{X_{n+1},\Theta\mid X_1,\dots,X_n}(x_{n+1},\theta\mid x_1,\dots,x_n)\,d\theta $$ $$ = \int f_{X_{n+1}\mid\Theta,X_1,\dots,X_n}(x_{n+1}\mid\theta,x_1,\dots,x_n) f_{\Theta\mid X_1,\dots,X_n}(\theta\mid x_1,\dots,x_n) \, d\theta $$ $$ = \int f_{X_{n+1}\mid\Theta}(x_{n+1}\mid\theta) f_{\Theta\mid X_1,\dots,X_n}(\theta\mid x_1,\dots,x_n) \, d\theta \, , $$ in which the first equality follows from the law of total probability, the second follows from the product rule, and the third from the assumed conditional independence: given the value of $\Theta$, we don't need the values of $X_1,\dots,X_n$ to determine the distribution of $X_{n+1}$.

The simulation scheme is correct: for $i=1,\dots,N$, draw $\theta^{(i)}$ from the distribution of $\Theta\mid X_1=x_1,\dots,X_n=x_n$, then draw $x_{n+1}^{(i)}$ from the distribution of $X_{n+1}\mid\Theta=\theta^{(i)}$. This gives you a sample $\{x_{n+1}^{(i)}\}_{i=1}^N$ from the distribution of $X_{n+1}\mid X_1=x_1,\dots,X_n=x_n$.

Related Question