Log predictive density asmptotically in predictive information criteria for Bayesian models

bayesianstatistical-inferencestatistics

I am reading this paper, Andrew Gelman's Understanding predictive information criteria for Bayesian models, and I will give a screenshot as below:

enter image description here

Sorry for the long paragraph. The things that confuse me are:

  1. Why here seems like we know the posterior distribution $f(\theta|y)$ first, then we use it to find the $\log p(y|\theta)$? Shouldn't we get the model, $p(y|\theta)$ first?

  2. What does the green line "its posterior mean is at a value $\frac{k}{2}$ lower" mean? My understanding is since there is a term $-\frac{1}{2}\chi^2_k$ in the expression and the expectation of $\chi^2_k$ is $k$, which lead to a $\frac{k}{2}$ lower. But $\frac{k}{2}$ lower than what?

  3. How does the $\log p(y|\theta)$ interpreting the measure of fit? I can see that there is a mean square error(MSE) term in this expression but it is an MSE of the parameter $\theta$, not the data $y$.

Thanks for any help!

Best Answer

  1. When we look at the posterior distribution, we are concerned with two contributing factors: the prior and the likelihood. As we are looking at the asymptotic limit $n \rightarrow \infty$, we know that the influence of the prior is negligible. We can model this limiting $\theta|y$ as $N(\theta_0,V_0/n)$ to ascertain the behavior of the posterior from the likelihood. In the excerpt you have provided this is merely a heuristic for measuring fitness of your model because the log predictive density is approximately inferred from the likelihood. So to answer your question, we kind of are.
  2. The author is saying that the log predictive density posterior distribution $c(y)-\frac{1}{2}(k\log(2\pi)+\log|V_0/n|)-\frac{1}{2}\chi_k^2$ is maximized when $\theta$ equals the maximum likelihood estimate. So we differentiate the log predictive density and set it equal to zero in order to solve for the value of $\theta$ that maximizes the distribution. This distribution for this $\theta$ has a mean which is equal to $\frac{k}{2}$ less than the maximum possible value of the log predictive density posterior distribution or maximum likelihood estimation. We expect this because $[c(y)-\frac{1}{2}(k\log(2\pi)+\log|V_0/n|)] - [c(y)-\frac{1}{2}(k\log(2\pi)+\log|V_0/n|)-\frac{1}{2}\chi_k^2]$ for $\chi_k^2=k$ is $\frac{k}{2}$.
  3. I hope that the answer to this question is made more clear by the previous two answers. Ultimately the idea is that these approximation will work well if the model is a good fit.

Alright, I did my best. I hope that this helps a little bit.

Related Question