Posterior Predictive Checks – What Are Posterior Predictive Checks and Their Usefulness?

bayesianmodel selectionposterior

I understand what the posterior predictive distribution is, and I have been reading about posterior predictive checks, although it isn't clear to me what it does yet.

  1. What exactly is the posterior predictive check?
  2. Why do some authors say that running posterior predictive checks is "using the data twice" and should not be abused ? (or even that it is not Bayesian)? (e.g. see this or this)
  3. What is this check exactly useful for? Can it really be used for model selection? (e.g. does it factor in both, fitness and model complexity?)

Best Answer

Posterior predictive checks are, in simple words, "simulating replicated data under the fitted model and then comparing these to the observed data" (Gelman and Hill, 2007, p. 158). So, you use posterior predictive to "look for systematic discrepancies between real and simulated data" (Gelman et al. 2004, p. 169).

The argument about "using the data twice" is that you use your data for estimating the model and then, for checking if the model fits the data, while generally it is a bad idea and it would be better to validate your model on external data, that was not used for estimation.

Posterior predictive checks are helpful in assessing if your model gives you "valid" predictions about the reality - do they fit the observed data or not. It is a helpful phase of model building and checking. It does not give you a definite answer on if your model is "ok" or if it is "better" then other model, however, it can help you to check if your model makes sens.

This is nicely described in LaplacesDemon vignette Bayesian Inference:

Comparing the predictive distribution $y^\text{rep}$ to the observed data $y$ is generally termed a "posterior predictive check". This type of check includes the uncertainty associated with the estimated parameters of the model, unlike frequentist statistics.

Posterior predictive checks (via the predictive distribution) involve a double-use of the data, which violates the likelihood principle. However, arguments have been made in favor of posterior predictive checks, provided that usage is limited to measures of discrepancy to study model adequacy, not for model comparison and inference (Meng 1994).

Gelman recommends at the most basic level to compare $y^\text{rep}$ to $y$, looking for any systematic differences, which could indicate potential failings of the model (Gelman et al. 2004, p. 159). It is often first recommended to compare graphical plots, such as the distribution of $y$ and $y^\text{rep}$.

Related Question