I am reading this paper, Andrew Gelman's Understanding predictive information criteria for Bayesian models, and I will give a screenshot as below:
Sorry for the long paragraph. The things that confuse me are:
-
Why here seems like we know the posterior distribution $f(\theta|y)$ first, then we use it to find the $\log p(y|\theta)$? Shouldn't we get the model, $p(y|\theta)$ first?
-
What does the green line "its posterior mean is at a value $\frac{k}{2}$ lower" mean? My understanding is since there is a term $-\frac{1}{2}\chi^2_k$ in the expression and the expectation of $\chi^2_k$ is $k$, which lead to a $\frac{k}{2}$ lower. But $\frac{k}{2}$ lower than what?
-
How does the $\log p(y|\theta)$ interpreting the measure of fit? I can see that there is a mean square error(MSE) term in this expression but it is an MSE of the parameter $\theta$, not the data $y$.
Thanks for any help!
Best Answer
Alright, I did my best. I hope that this helps a little bit.