Does anyone know how to compute (or extract) leverage and Cook's distances for a mer
class object (obtained through lme4
package)?
I'd like to plot these for a residuals analysis.
Linear Mixed Effects Models – How to Extract Leverage and Cook’s Distances
leveragelinear modelmixed modelrresiduals
Related Solutions
If you take a look at the code (simple type plot.lm
, without parenthesis, or edit(plot.lm)
at the R prompt), you'll see that Cook's distances are defined line 44, with the cooks.distance()
function. To see what it does, type stats:::cooks.distance.glm
at the R prompt. There you see that it is defined as
(res/(1 - hat))^2 * hat/(dispersion * p)
where res
are Pearson residuals (as returned by the influence()
function), hat
is the hat matrix, p
is the number of parameters in the model, and dispersion
is the dispersion considered for the current model (fixed at one for logistic and Poisson regression, see help(glm)
). In sum, it is computed as a function of the leverage of the observations and their standardized residuals.
(Compare with stats:::cooks.distance.lm
.)
For a more formal reference you can follow references in the plot.lm()
function, namely
Belsley, D. A., Kuh, E. and Welsch, R. E. (1980). Regression Diagnostics. New York: Wiley.
Moreover, about the additional information displayed in the graphics, we can look further and see that R uses
plot(xx, rsp, ... # line 230
panel(xx, rsp, ...) # line 233
cl.h <- sqrt(crit * p * (1 - hh)/hh) # line 243
lines(hh, cl.h, lty = 2, col = 2) #
lines(hh, -cl.h, lty = 2, col = 2) #
where rsp
is labeled as Std. Pearson resid. in case of a GLM, Std. residuals otherwise (line 172); in both cases, however, the formula used by R is (lines 175 and 178)
residuals(x, "pearson") / s * sqrt(1 - hii)
where hii
is the hat matrix returned by the generic function lm.influence()
. This is the usual formula for std. residuals:
$$rs_j=\frac{r_j}{\sqrt{1-\hat h_j}}$$
where $j$ here denotes the $j$th covariate of interest. See e.g., Agresti Categorical Data Analysis, ยง4.5.5.
The next lines of R code draw a smoother for Cook's distance (add.smooth=TRUE
in plot.lm()
by default, see getOption("add.smooth")
) and contour lines (not visible in your plot) for critical standardized residuals (see the cook.levels=
option).
Package car
has quite a lot of useful functions for diagnostic plots of linear and generalized linear models. Compared to vanilla R plots, they are often enhanced with additional information. I recommend you try example("<function>")
on the following functions to see what the plots look like. All plots are described in detail in chapter 6 of Fox & Weisberg. 2011. An R Companion to Applied Regression. 2nd ed.
residualPlots()
plots Pearson residuals against each predictor (scatterplots for numeric variables including a Lowess fit, boxplots for factors)marginalModelPlots()
displays scatterplots of the response variable against each numeric predictor, inluding a Lowess fitavPlots()
displays partial-regression plots: for each predictor, this is a scatterplot of a) the residuals from the regression of the response variable on all other predictors against b) the residuals from the regression of the predictor against all other predictorsqqPlot()
for a quantile-quantile plot which includes a confidence envelopeinfluenceIndexPlot()
displays each value for Cook's distance, hat-value, p-value for outlier test, and studentized residual in a spike-plot against the observation indexinfluencePlot()
gives a bubble-plot of studentized residuals against hat-values, with the size of the bubble corresponding to Cook's distance, also seedfbetaPlots()
andleveragePlots()
boxCox()
displays a profile of the log-likelihood for the transformation parameter $\lambda$ in a Box-Cox power-transformcrPlots()
is for component + residual plots, a variant of which are CERES plots (Combining conditional Expectations and RESiduals), provided byceresPlots()
spreadLevelPlot()
is for assessing non-constant error variance and displays absolute studentized residuals against fitted valuesscatterplot()
provides much-enhanced scatterplots inluding boxplots along the axes, confidence ellipses for the bivariate distribution, and prediction lines with confidence bandsscatter3d()
is based on packagergl
and displays interactive 3D-scatterplots including wire-mesh confidence ellipsoids and prediction planes, make sure to runexample("scatter3d")
In addition, have a look at bplot()
from package rms
for another approach to illustrating the common distribution of three variables.
Best Answer
You should have a look at the R package
influence.ME
. It allows you to compute measures of influential data for mixed effects models generated bylme4
.An example model:
The function
influence
is the basis for all further steps:Calculate Cook's distance:
Plot Cook's distance: