Solved – Casewise diagnostics and testing assumptions for a mixed effect logistic regression in R

assumptionscooks-distancelme4-nlmelogisticregression

I am modelling a binary outcome (Buried), that has two predictors: Offset (a 3 level factor) and width (continuous predictor). In addition, multiple data points came from the same unit — a chamber, with 27 chambers in total — so I have included random intercepts that vary by chamber to account for the non-independence of data points from the same chamber.

I have run my model using the glmer function in the lme4 package in R. My model is:

    ball3=glmer(Buried~Offset+Width_mm+(1|Chamber), family=binomial, data=ballData)

I am now up to checking the assumptions of the model. I am able to get the fitted values and the residuals for this model using the following:

    fitted(ball3)
    resid(ball3)

When I try to get other casewise statistics using the following commands:

    standardised.resid=rstandard(ball3)
    studentised.resid=rstudent(ball3)
    dfbeta=dfbeta(ball3)
    dffit=dffits(ball3)
    leverage=hatvalues(ball3)

I get the following error message (using rstandard as an example):

Error in UseMethod("rstandard") :
no applicable method for 'rstandard' applied to an object of class "c('glmerMod', 'merMod')"

Likewise, when I try to get the Variance Inflation Factor (VIF) to check for multicollinearity, I get the following error (top line is the command, bottom is error):

    vif(ball3)

Error in UseMethod("determinant") :
no applicable method for 'determinant' applied to an object of class "c('dpoMatrix', 'dsyMatrix', 'ddenseMatrix', 'symmetricMatrix', 'dMatrix', 'denseMatrix', 'compMatrix', 'Matrix', 'xMatrix', 'mMatrix')"
In addition: Warning message:
In vif.default(ball3) : No intercept: vifs may not be sensible.

My questions are:

  1. Is there a way to get casewise diagnostics (such as standardised residuals, cook's distance etc) for my model (a mixed effect logistic regression modeled using glmer)?

  2. How can I calculate the vif statistic for this model?

  3. Would my model have the same assumption of homoscedasiticy, and normality of the residuals as in a simple linear regression. If so, how would I test this given that the plot of the residuals vs fitted, and a qq plot of the residuals would look a bit odd given the binary nature of the outcome.

  4. Are there any other important assumptions that I should check for my model?

My grasp of statstics is rather basic, so any help is appreciated.

Thanks Ben for pointing me to the influence.ME package. I have pasted in the code that I used to estimate the dfBeta values and Cook’s distance because 1) it would be great if someone could check what I have done and correct me if I am wrong and 2) it might be useful to someone else who has a similar model where they want to check for influential points.

To calculate the parameter estimates without each data point, along with several other values that are needed to then calculate influence stats:

    influence=influence(ball3, obs = TRUE)

The obs=TRUE, means that single observations – rather than groups – are deleted from the model. Then use the influence object created above to estimate Cooks distance:

    cooks=cooks.distance(influence, sort=TRUE)

The sort=TRUE sorts the Cook's distance values, making it easy to find large values.
Then find the dfbeta values:

    dfbetas(influence, sort=TRUE, to.sort="Width_mm", abs=FALSE)

Here I have sorted the dfbeta values for the parameter “Width_mm”. The dfbeta values for the other parameters in the model are also given, but the data frame is sorted on the dfbeta values for the “Width_mm” parameter.

One thing to note however, when I ran the first line of code, I did get the following warning:

    Warning in checkConv(attr(opt, "derivs"), opt$par, ctrl = control$checkConv,  :
      Model failed to converge with max|grad| = 0.00122035 (tol = 0.001, component 1)

However cooks distance and dfbeta values were returned for all observations in the data set. So I am not too sure what has happened here.

Best Answer

  • check out the influence.ME package. It's designed for quantifying group-level rather than observation-level influence, but it works for the latter (specify obs=TRUE in influence()); it can also be pretty slow because it re-estimates the model for each case (e.g., 15 seconds on my laptop to compute observation-level influences for a relatively small LMM fitted to the 144-row Penicillin data set).
  • for VIF (which I'm not wild about), I'm not sure: this question has been asked and so far not answered. On the other hand, a slightly more general version was asked and answered here (pointing to this Github repository).
  • In general diagnostics on binary models are difficult: the conceptual problem here is not specific to mixed models but applies to binary GLMs in general. See this question and this question.
  • I would consider checking out this set of examples for GLMM diagnostics in R.
Related Question