Posterior distribution (when prior is normal)

bayesiannormal distributionstatistics

Suppose
\begin{align*}
a &\sim \mathcal{N}(m_0,v_1)\\
\theta \mid a &\sim \mathcal{N}(a,v_2)\\
x \mid \theta &\sim \mathcal{N}(\theta,v_3),
\end{align*}

where $m_0, v_1, v_2$ and $v_3$ are known. After oberving $x_1,…,x_n$, what is the posterior mean and variance of $a$ and $\theta$?

Could I mimic the procedure when we only have $x \mid \theta$ and a prior of $\theta$, i.e.,
$$
f(a \mid X) \propto f(X \mid a) \cdot f(a)
$$

And I was wondering what to do next.

Best Answer

First, let's do a recap on the simpler problem. For $a \sim \mathcal{N}(m_0,v_1)$ and $x \mid a \sim \mathcal{N}(a, v_2)$, we have $$f(a) = \frac{1}{\sqrt{2\pi v_1}}\exp\left(-\frac{(a-m_0)^2}{2v_1}\right)$$ $$f(\boldsymbol{x} \mid a) = \prod_{i=1}^n f(x_i \mid a) = \prod_{i=1}^n \frac{1}{\sqrt{2\pi v_2}}\exp\left(-\frac{(x_i-a)^2}{2v_2}\right)$$ Then we can write \begin{align*} f(a \mid \boldsymbol{x}) &\propto f(a, \boldsymbol{x})\\ &= f(a) \cdot f(\boldsymbol{x} \mid a)\\ &\propto \exp\left(-\frac{(a-m_0)^2}{2v_1} - \sum_{i=1}^n \frac{(x_i-a)^2}{2v_2}\right)\\ &= \exp\left(-\frac{v_2(a-m_0)^2 + \sum_{i=1}^n v_1(x_i-a)^2}{2v_1v_2}\right)\\ &\propto \exp\left(-\frac{(v_2+nv_1)a^2 - 2(v_2m_0 + v_1n\bar{x})a}{2v_1v_2}\right)\\ &\propto \exp\left(-\frac{\left(a-\frac{v_2m_0+nv_1\bar{x}}{v_2+nv_1}\right)^2}{2\frac{v_1v_2}{v_2+nv_1}}\right). \end{align*} Thus, the posterior distribution of $a$ after observing $x_1,\dots,x_n$ is $$a \mid \boldsymbol{x} \sim \mathcal{N}\left(\frac{v_2m_0+nv_1\bar{x}}{v_2+nv_1},\frac{v_1v_2}{v_2+nv_1}\right).$$ Note that the posterior mean of $a$ is a weighted average of the prior mean ($m_0$) and the sample mean ($\bar{x} = \frac{1}{n}\sum_{i=1}^n x_i$). The posterior variance of $a$ is the harmonic mean of $v_1$ and $v_2/n$. Here the main technique is to "complete the squares".

Now if we add on additional layer to the problem, same procedure still applies - we need to "complete the quadratic form" in the context of this multi-dimensional problem. Specifically, for $a \sim \mathcal{N}(m_0,v_1)$, $\theta \mid a \sim \mathcal{N}(a, v_2)$, and $x \mid \theta \sim \mathcal{N}(\theta, v_3)$, we have $$f(a) = \frac{1}{\sqrt{2\pi v_1}}\exp\left(-\frac{(a-m_0)^2}{2v_1}\right)$$ $$f(\theta \mid a) = \frac{1}{\sqrt{2\pi v_2}}\exp\left(-\frac{(\theta-a)^2}{2v_2}\right)$$ $$f(\boldsymbol{x} \mid \theta) = \prod_{i=1}^n f(x_i \mid \theta) = \prod_{i=1}^n \frac{1}{\sqrt{2\pi v_3}}\exp\left(-\frac{(x_i-\theta)^2}{2v_3}\right)$$ Then we can write \begin{align*} f(a, \theta \mid \boldsymbol{x}) &\propto f(a, \theta, \boldsymbol{x})\\ &= f(a) \cdot f(\theta \mid a) \cdot f(\boldsymbol{x} \mid \theta)\\ &\propto \exp\left(-\frac{(a-m_0)^2}{2v_1} - \frac{(\theta-a)^2}{2v_2} - \sum_{i=1}^n \frac{(x_i-\theta)^2}{2v_3}\right)\\ &\propto \exp\left(-\frac{v_2v_3(a^2 - 2m_0 a) + v_1v_3(a^2 - 2a\theta + \theta^2) + v_1v_2(n\theta^2 - 2n\bar{x}\theta)}{2v_1v_2v_3}\right)\\ &= \exp\left(-\frac{(v_2v_3 + v_1v_3)a^2 - 2v_2v_3m_0a + (nv_1v_2 + v_1v_3)\theta^2 - 2v_1v_2 n\bar{x}\theta - 2v_1v_3 \theta a \ }{2v_1v_2v_3}\right), \end{align*} where the exponent is a quadratic form of $(a, \theta)$, indicating that the (joint) posterior distribution of $(a, \theta)$ is $\mathcal{N}_2(\mu_a, \mu_\theta, \sigma_a^2, \sigma_\theta^2, \rho)$. Recall the kernel of a bivariate normal distribution is given by $$\exp\left(-\frac{1}{2(1-\rho^2)}\left[\left(\frac{a-\mu_a}{\sigma_a}\right)^2 - 2\rho\left(\frac{a-\mu_a}{\sigma_a}\right)\left(\frac{\theta-\mu_\theta}{\sigma_\theta}\right) + \left(\frac{\theta-\mu_\theta}{\sigma_\theta}\right)^2\right]\right).$$ By matching the corresponding coefficients (i.e. "completing the quadratic form"), we have the following equation system: $$\begin{cases}\frac{1}{(1-\rho^2)\sigma_a^2} &= \frac{v_2v_3 + v_1v_3}{v_1v_2v_3}\\ \frac{1}{(1-\rho^2)\sigma_\theta^2} &= \frac{nv_1v_2 + v_1v_3}{v_1v_2v_3}\\ \frac{\rho}{(1-\rho^2)\sigma_a\sigma_\theta} &= \frac{v_1v_3}{v_1v_2v_3}\\ \frac{-2}{(1-\rho^2)\sigma_a^2}\mu_a + \frac{2\rho}{(1-\rho^2)\sigma_a\sigma_\theta}\mu_\theta &= \frac{-2v_2v_3m_0}{v_1v_2v_3}\\ \frac{-2}{(1-\rho^2)\sigma_\theta^2}\mu_\theta + \frac{2\rho}{(1-\rho^2)\sigma_a\sigma_\theta}\mu_a &= \frac{-2v_1v_2n\bar{x}}{v_1v_2v_3}\end{cases}$$ Solving these equations for $(\mu_a, \mu_\theta, \sigma_a^2, \sigma_\theta^2, \rho)$, we have $$\begin{cases}\mu_a &= \frac{(nv_2 + v_3)m_0 + nv_1\bar{x}}{nv_1 + nv_2 + v_3}\\ \mu_\theta &= \frac{v_3m_0 + (nv_1 + nv_2)\bar{x}}{nv_1 + nv_2 + v_3}\\ \sigma_a^2 &= \frac{nv_1v_2 + v_1v_3}{nv_1 + nv_2 + v_3}\\ \sigma_\theta^2 &= \frac{v_2v_3 + v_1v_3}{nv_1 + nv_2 + v_3}\\ \rho &= \frac{v_1v_3}{\sqrt{v_2v_3 + v_1v_3}\sqrt{nv_1v_2 + v_1v_3}}\end{cases}$$ Thus, the (joint) posterior distribution of $(a, \theta)$ after observing $x_1,\dots,x_n$ is $$(a, \theta) \mid \boldsymbol{x} \sim \mathcal{N}_2(\mu_a, \mu_\theta, \sigma_a^2, \sigma_\theta^2, \rho),$$ with parameters specified above.

Note that the posterior means of both $a$ and $\theta$ are weighted averages of the prior mean ($m_0$) and the sample mean ($\bar{x} = \frac{1}{n}\sum_{i=1}^n x_i$), with different weights. $\rho > 0$ is the posterior correlation coefficient between $a$ and $\theta$.