[Math] Normal approximations to the posterior distribution

approximationbayesiannormal distribution

I'm reading about Bayesian Data Analysis by Gelman et al. and I'm a little bit confused about how the author approximates posterior density by normal approximation. He first makes the Taylor series expansion for the logarithm of the posterior distribution under consideration:

$$\log \,p(\theta \,|\, y) = \log \,p(\hat{\theta} \,|\, y)+
\frac12(\theta-\hat{\theta})^2\left[\frac{d^2}{d\theta^2} \log
\,p(\theta \,|\, y)\right]_{\theta=\hat{\theta}}+\cdots\;\;\;\;\;\;\;(1)$$

Then he states that

Linear term in $(1)$ is zero because the log-posterior density has zero derivative at its mode. Considering $(1)$ as a function of $\theta$, the first
term is a constant, whereas the second term is proportional to the
logarithm of a normal density, yielding the approximation,

$$p(\theta \,|\, y) \approx N\left(\hat{\theta},
\left[I(\hat{\theta})\right]^{-1}\right),$$

where $$I(\theta)=-\frac{d^2}{d\theta^2}\,\log \,p(\theta \,|\, y)$$

is the observed information.

This escaped me a little, so I tried to break this down more:

$$p(\theta \,|\, y) \approx N\left(\hat{\theta},
\left[I(\hat{\theta})\right]^{-1}\right)=\frac{1}{\sqrt{2\pi/I(\hat{\theta})}}\,\exp\left( -\frac12(\theta-\hat{\theta})^2I(\hat{\theta})\right)$$

$$\log \,p(\theta\,|\,y)\approx \frac12\log\left(\frac{I(\hat{\theta})}{2\pi}\right)-\frac12(\theta-\hat{\theta})^2I(\hat{\theta})$$

$$= \frac12\left[\log\left(-\frac{d^2}{d\theta^2}\,\log \,p(\hat{\theta} \,|\, y)\right)-\log2\pi\right]+\frac12(\theta-\hat{\theta})^2\frac{d^2}{d\theta^2}\,\log \,p(\hat{\theta} \,|\, y)\;\;\;\;\;\;(2)$$

So this result that I got isn't quite the same as in $(1)$? Did I do something wrong in the math or what is my malfunction here now x)

Thank you for your help! Please let me know if my question is unclear.

Best Answer

I think I might have gotten it myself, correct me please if I'm wrong:

The expressions $(1)$ and $(2)$ are the same with the exception of the constant terms. In $(1)$ the constant term is:

$$C_1=\log\,p(\hat{\theta}\,|\,y),$$

since $\hat{\theta}$ and $y$ are not variables. Similarly in $(2)$ the constant term is:

$$C_2=\frac12\left[\log\left(-\frac{d^2}{d\theta^2}\,\log \,p(\hat{\theta} \,|\, y)\right)-\log2\pi\right].$$

We can write $(1)$ as:

$$\log\,p(\theta\,|\,y) = C_1-\frac12(\theta-\hat{\theta})^2I(\hat{\theta})=C_1-\frac12\frac{(\theta-\hat{\theta})^2}{I(\hat{\theta})^{-1}},$$

$$ p(\theta\,|\,y) = \exp\left(C_1\right)\exp\left(-\frac12\frac{(\theta-\hat{\theta})^2}{I(\hat{\theta})^{-1}}\right)\approx N\left(\hat{\theta}, \left[I(\hat{\theta})\right]^{-1}\right)=\exp\left(C_2\right)\exp\left(-\frac12\frac{(\theta-\hat{\theta})^2}{I(\hat{\theta})^{-1}}\right),$$

so the posterior density is proportional to normal distribution:

$$p(\theta\,|\,y) \propto N\left(\hat{\theta}, \left[I(\hat{\theta})\right]^{-1}\right)=\exp\left(C_2\right)\exp\left(-\frac12\frac{(\theta-\hat{\theta})^2}{I(\hat{\theta})^{-1}}\right)$$