Linear Mixed Effects Models – How to Extract Leverage and Cook’s Distances

leveragelinear modelmixed modelrresiduals

Does anyone know how to compute (or extract) leverage and Cook's distances for a mer class object (obtained through lme4 package)?
I'd like to plot these for a residuals analysis.

Best Answer

You should have a look at the R package influence.ME. It allows you to compute measures of influential data for mixed effects models generated by lme4.

An example model:

library(lme4)
model <- lmer(mpg ~ disp + (1 | cyl), mtcars)

The function influence is the basis for all further steps:

library(influence.ME)
infl <- influence(model, obs = TRUE)

Calculate Cook's distance:

cooks.distance(infl)

Plot Cook's distance:

plot(infl, which = "cook")

enter image description here

Related Solutions

Solved – What kind of residuals and Cook’s distance are used for GLM

If you take a look at the code (simple type plot.lm, without parenthesis, or edit(plot.lm) at the R prompt), you'll see that Cook's distances are defined line 44, with the cooks.distance() function. To see what it does, type stats:::cooks.distance.glm at the R prompt. There you see that it is defined as

(res/(1 - hat))^2 * hat/(dispersion * p)

where res are Pearson residuals (as returned by the influence() function), hat is the hat matrix, p is the number of parameters in the model, and dispersion is the dispersion considered for the current model (fixed at one for logistic and Poisson regression, see help(glm)). In sum, it is computed as a function of the leverage of the observations and their standardized residuals. (Compare with stats:::cooks.distance.lm.)

For a more formal reference you can follow references in the plot.lm() function, namely

Belsley, D. A., Kuh, E. and Welsch, R. E. (1980). Regression Diagnostics. New York: Wiley.

Moreover, about the additional information displayed in the graphics, we can look further and see that R uses

plot(xx, rsp, ...                    # line 230
panel(xx, rsp, ...)                  # line 233
cl.h <- sqrt(crit * p * (1 - hh)/hh) # line 243
lines(hh, cl.h, lty = 2, col = 2)    #
lines(hh, -cl.h, lty = 2, col = 2)   #

where rsp is labeled as Std. Pearson resid. in case of a GLM, Std. residuals otherwise (line 172); in both cases, however, the formula used by R is (lines 175 and 178)

residuals(x, "pearson") / s * sqrt(1 - hii)

where hii is the hat matrix returned by the generic function lm.influence(). This is the usual formula for std. residuals:

$$rs_j=\frac{r_j}{\sqrt{1-\hat h_j}}$$

where $j$ here denotes the $j$th covariate of interest. See e.g., Agresti Categorical Data Analysis, §4.5.5.

The next lines of R code draw a smoother for Cook's distance (add.smooth=TRUE in plot.lm() by default, see getOption("add.smooth")) and contour lines (not visible in your plot) for critical standardized residuals (see the cook.levels= option).

r – Extensions to Default Diagnostic Plots for lm in R

Package car has quite a lot of useful functions for diagnostic plots of linear and generalized linear models. Compared to vanilla R plots, they are often enhanced with additional information. I recommend you try example("<function>") on the following functions to see what the plots look like. All plots are described in detail in chapter 6 of Fox & Weisberg. 2011. An R Companion to Applied Regression. 2nd ed.

residualPlots() plots Pearson residuals against each predictor (scatterplots for numeric variables including a Lowess fit, boxplots for factors)
marginalModelPlots() displays scatterplots of the response variable against each numeric predictor, inluding a Lowess fit
avPlots() displays partial-regression plots: for each predictor, this is a scatterplot of a) the residuals from the regression of the response variable on all other predictors against b) the residuals from the regression of the predictor against all other predictors
qqPlot() for a quantile-quantile plot which includes a confidence envelope
influenceIndexPlot() displays each value for Cook's distance, hat-value, p-value for outlier test, and studentized residual in a spike-plot against the observation index
influencePlot() gives a bubble-plot of studentized residuals against hat-values, with the size of the bubble corresponding to Cook's distance, also see dfbetaPlots() and leveragePlots()
boxCox() displays a profile of the log-likelihood for the transformation parameter $\lambda$ in a Box-Cox power-transform
crPlots() is for component + residual plots, a variant of which are CERES plots (Combining conditional Expectations and RESiduals), provided by ceresPlots()
spreadLevelPlot() is for assessing non-constant error variance and displays absolute studentized residuals against fitted values
scatterplot() provides much-enhanced scatterplots inluding boxplots along the axes, confidence ellipses for the bivariate distribution, and prediction lines with confidence bands
scatter3d() is based on package rgl and displays interactive 3D-scatterplots including wire-mesh confidence ellipsoids and prediction planes, make sure to run example("scatter3d")

In addition, have a look at bplot() from package rms for another approach to illustrating the common distribution of three variables.

Best Answer

Related Solutions

Solved – What kind of residuals and Cook’s distance are used for GLM

r – Extensions to Default Diagnostic Plots for lm in R

Related Question