Solved – Residual variance for glmer

binomial distributionlme4-nlmerresidualsvariance

I am running a glmer model and I want to determine the total variance. My data is for survival and it is coded as 0 and 1, where 1 represents that the individual survived and 0 represents that the individual died. My data represents offspring from a full factorial cross where some individuals are full sibs or half sibs.

When running a glmer model, and there is no residual variance in the summary output. I have read that the residual variance should be (π^2)/3 for generalized linear mixed models with binomial data and logit link function (Nakagawa, S., Schielzeth, H. 2010. Repeatability for Gaussian and non-Gaussian data: a practical guide for biologists. Biol. Rev. 85:935-956.).

Is this true? Or is there a different way to calculate the residual variance for glmer?

Here is my model and output:

model6 = glmer(X09.Nov~(1|Dam)+(1|Sire)+(1|Sire:Dam), family=binomial, data=data)
summary(model6) 

Generalized linear mixed model fit by maximum likelihood (Laplace Approximation 
      [glmerMod]
 Family: binomial  ( logit )
Formula: X09.Nov ~ (1 | Dam) + (1 | Sire) + (1 | Sire:Dam)
   Data: data

    AIC      BIC   logLik deviance df.resid 
 1274.4   1295.3   -633.2   1266.4     1375 

Scaled residuals: 
    Min      1Q  Median      3Q     Max 
-3.2747  0.3366  0.3931  0.4664  1.1090 

Random effects:
Groups   Name        Variance  Std.Dev. 
Sire:Dam (Intercept) 3.853e-01 6.207e-01
Sire     (Intercept) 4.181e-02 2.045e-01
Dam      (Intercept) 6.036e-09 7.769e-05
Number of obs: 1379, groups:  Sire:Dam, 49; Sire, 7; Dam, 7
Fixed effects:
            Estimate Std. Error z value     Pr    
(Intercept)   1.6456     0.1419    11.6 <2e-16 *

Best Answer

One possible interpretation of the logistic regression model is to state that there is an underlying score $$ y^*_i = x_i'\beta + \epsilon_i, $$ with the observed variable being $$y_i = \left\{ \begin{array}{ll} 1, & y^*_i > 0 \\ 0, & y^*_i \le 0\end{array}\right.$$ This would be the way logistic regression would be introduced in social sciences, as opposed to biostatistics. In this formulation, $\epsilon$ follows a logistic distribution, which does have the variance of $\pi^2/3$. Mixed models stick an additional random effects term into the equation, and introduce the double subscripts, making it $$ y^*_{ij} = x_{ij}'\beta + u_i + \epsilon_{ij}, $$ where $u_i$ is assumed normal because the normal distribution is something that everybody understands. The variance of $u_i$ is estimated by your mixed model package (although without the standard errors; Douglas Bates has a pretty strong stand on it). So the total variance is then $\sigma^2_u + \mathbb{V}[\epsilon] = \sigma^2_u + \pi^2/3$.

In your somewhat more complicated model, you just need to add all the variances of the variance components. It seems weird to me that the strongest effect that you have is that of the interaction, with the magnitudes of the main effects being smaller. See if this makes sense in your application. Also, Laplace approximation is at best a starting point; you need to increase the number of integration points to get accurate estimates of variance components.

Related Question