Solved – Residual diagnostics in MCMC -based regression models

bayesianmarkov-chain-montecarlomixed modelrresiduals

I've recently embarked on fitting regression mixed models in the Bayesian framework, using a MCMC algorithm (function MCMCglmm in R actually).

I believe I have understood how to diagnose convergence of the estimation process (trace, geweke plot, autocorrelation, posterior distribution…).

One of the thing that strikes me in the Bayesian framework is that much effort seems to devoted to do those diagnostics, whereas very little appears to be done in terms of checking the residuals of the fitted model. For instance in MCMCglmm the residual.mcmc() function does exist but is actually not yet implemented (ie.returns: "residuals not yet implemented for MCMCglmm objects"; same story for predict.mcmc()). It seems to be lacking from other packages too, and more generally is little discussed in the literature I've found (apart from DIC which is quite heavily discussed too).

Could anyone point me to some useful references, and ideally R code I could play with or modify?

Many thanks.

Best Answer

I think the use of the term residual is not consistent with Bayesian regression. Remember, in frequentist probability models, it's the parameters which are considered fixed estimable quantities and the data generating mechanism has some random probability model associated with observed data. For Bayesians, the parameters of probability models are considered to be variable and the fixed data update our belief about what those parameters are. Therefore, if you were calculating the variance of the observed minus fitted values in a regression model, the observed component would have 0 variance whereas the fitted component would vary as a function of the posterior probability density for the model parameters. This is the opposite of what you would derive from the frequentist regression model. I think if one were interested in checking the probabilistic assumptions of their Bayesian regression model, a simple QQplot of the posterior density of parameter estimates (estimated from our MCMC sampling) versus a normal distribution would have diagnostic power analogous to analyzing residuals (or Pearson residuals for non-linear link functions).

Related Solutions

MCMC – Semi-Automating MCMC Convergence Diagnostics to Set Burn-In Length

Here is one approach at the automation. Feedback much appreciated. This is an attempt to replace initial visual inspection with computation, followed by subsequent visual inspection, in keeping with standard practice.

This solution actually incorporates two potential solutions, first, calculate burn-in to remove the length of chain before some threshold is reached, and then using the autocorrelation matrix to calculate the thinning interval.

calculate a vector of the maximum median Gelman-Rubin convergence diagnostic shrink factor (grsf) for all variables in the
find the minimum number of samples at which the grsf across all variables goes below some threshold, e.g. 1.1 in the example, perhaps lower in practice
sub sample the chains from this point to the end of the chain
thin the chain using the autocorrelation of the most autocorrelated chain
visually confirm convergence with trace, autocorrelation, and density plots

The mcmc object can be downloaded here: jags.out.Rdata

# jags.out is the mcmc.object with m variables
library(coda)    
load('jags.out.Rdata')
# 1. calculate max.gd.vec, 
# max.gd.vec is a vector of the maximum shrink factor
max.gd.vec     <- apply(gelman.plot(jags.out)$shrink[, ,'median'], 1, max)
# 2. will use window() to subsample the jags.out mcmc.object
# 3. start window at min(where max.gd.vec < 1.1, 100) 
window.start   <- max(100, min(as.numeric(names(which(max.gd.vec - 1.1 < 0)))))
jags.out.trunc <- window(jags.out, start = window.start)
# 4. calculate thinning interval
# thin.int is the chain thin interval
# step is very slow 
# 4.1 find n most autocorrelated variables
n = min(3, ncol(acm))
acm             <- autocorr.diag(jags.out.trunc)
acm.subset      <- colnames(acm)[rank(-colSums(acm))][1:n]
jags.out.subset <- jags.out.trunc[,acm.subset]
# 4.2 calculate the thinning interval
# ac.int is the time step interval for autocorrelation matrix
ac.int          <- 500 #set high to reduce computation time
thin.int        <- max(apply(acm2 < 0, 2, function(x) match(T,x)) * ac.int, 50)
# 4.3 thin the chain 
jags.out.thin   <- window(jags.out.trunc, thin = thin.int)
# 5. plots for visual diagnostics
plot(jags.out.thin)
autocorr.plot(jags.win.out.thin)

--update--

As implemented in R the computation of the autocorrelation matrix is slower than would be desirable (>15 min in some cases), to a lesser extent, so is computation of the GR shrink factor. There is a question about how to speed up step 4 on stackoverflow here

--update part 2--

additional answers:

It is not possible to diagnose convergence, only to diagnose lack of convergence (Brooks, Giudici, and Philippe, 2003)
The function autorun.jags from the package runjags automates calculation of run length and convergence diagnostics. It does not start monitoring the chain until the Gelman rubin diagnostic is below 1.05; it calculates the chain length using the Raftery and Lewis diagnostic.
Gelman et al's (Gelman 2004 Bayesian Data Analysis, p. 295, Gelman and Shirley, 2010) state that they use a conservative approach of discarding the 1st half of the chain. Although a relatively simple solution, in practice this is sufficient to solve the issue for my particular set of models and data.

#code for answer 3
chain.length <- summary(jags.out)$end
jags.out.trunc <- window(jags.out, start = chain.length / 2)
# thin based on autocorrelation if < 50, otherwise ignore
acm <- autocorr.diag(jags.out.trunc, lags = c(1, 5, 10, 15, 25))
# require visual inspection, check acceptance rate
if (acm == 50) stop('check acceptance rate, inspect diagnostic figures') 
thin.int <- min(apply(acm2 < 0, 2, function(x) match(TRUE, x)), 50)
jags.out.thin <- window(jags.out.trunc, thin = thin.int)

R Negative-Binomial-Distribution – Performing Residual Diagnostics for Negative Binomial Regression

Check out the DHARMa package in R. It uses a simulation based approach with quantile residuals to generate the type of residuals you may be interested in. And it works with glm.nb from MASS.

The essential idea is explained here and goes in three steps:

Simulate plausible responses for each case. You can use the distribution of each regression coefficient (coefficient together with standard error) to generate several sets of coefficients. You can multiply each set of coefficients by the observed predictors for each case to obtain multiple simulated response values for each case.
from the multiple response values for each case, generate the empirical cumulative density function (cdf)
find the value of the empirical cdf that corresponds to the observed response for each case. This is your residual and it ranges from 0 to 1.

If there is no systematic misspecification in your model, all values from the empirical cdf are equally likely. And a histogram of these residuals should be uniformly distributed between 0 and 1.

The package contains additional checks.

Edit:

The steps above are not exactly correct. The biggest difference between my description and what DHARMa does is DHARMa uses the simulate() function in base R, which ignores the variability in the estimated regression coefficients. The Gelman and Hill regression text recommends taking the variability in regression coefficients into account.

A crucial step I forgot to include is: once one has generated the responses, we should place them on the response scale. For example, the predicted variables from logistic regression are logits, so one should place them on the probability scale. Next step would be to randomly generate observed scores using the predicted probabilities. Continuing with the logistic regression example, one can use rbinom() to generate 0-1 outcomes given predicted probabilities.

Additionally, when the responses are integers, such as with binary outcomes or count data models, DHARMa adds random noise to the simulated responses prior to computing the empirical cdf. This step ensures the residuals behave as one would expect in OLS if the model was not misspecified. Without it, you could have a pile up of residuals at 1 if your outcome is binary outcome.

The code for the simulate residuals function in DHARMa is relatively easy to follow for anyone looking to roll their own implementation.

Best Answer

Related Solutions

MCMC – Semi-Automating MCMC Convergence Diagnostics to Set Burn-In Length

R Negative-Binomial-Distribution – Performing Residual Diagnostics for Negative Binomial Regression

Related Question