Solved – Residual diagnostics in MCMC -based regression models

bayesianmarkov-chain-montecarlomixed modelrresiduals

I've recently embarked on fitting regression mixed models in the Bayesian framework, using a MCMC algorithm (function MCMCglmm in R actually).

I believe I have understood how to diagnose convergence of the estimation process (trace, geweke plot, autocorrelation, posterior distribution…).

One of the thing that strikes me in the Bayesian framework is that much effort seems to devoted to do those diagnostics, whereas very little appears to be done in terms of checking the residuals of the fitted model. For instance in MCMCglmm the residual.mcmc() function does exist but is actually not yet implemented (ie.returns: "residuals not yet implemented for MCMCglmm objects"; same story for predict.mcmc()). It seems to be lacking from other packages too, and more generally is little discussed in the literature I've found (apart from DIC which is quite heavily discussed too).

Could anyone point me to some useful references, and ideally R code I could play with or modify?

Many thanks.

Best Answer

I think the use of the term residual is not consistent with Bayesian regression. Remember, in frequentist probability models, it's the parameters which are considered fixed estimable quantities and the data generating mechanism has some random probability model associated with observed data. For Bayesians, the parameters of probability models are considered to be variable and the fixed data update our belief about what those parameters are. Therefore, if you were calculating the variance of the observed minus fitted values in a regression model, the observed component would have 0 variance whereas the fitted component would vary as a function of the posterior probability density for the model parameters. This is the opposite of what you would derive from the frequentist regression model. I think if one were interested in checking the probabilistic assumptions of their Bayesian regression model, a simple QQplot of the posterior density of parameter estimates (estimated from our MCMC sampling) versus a normal distribution would have diagnostic power analogous to analyzing residuals (or Pearson residuals for non-linear link functions).

Related Question