I have a logistic regression model that I was doing using the glm function in R. A reviewer commented that we needed to assess the impact of random effects. I used glmer, from R package lme4, to assess that, adding a random effects variable. But the estimates and standard errors of the fixed effect coefficients in the results of the glmer are identical to those of the glm, and the variance of the random effects variable is reported as zero – not something very low, but actually zero.
This makes me worry that I have not set up the model correctly.
The model has a logical result variable 'Script' that is regressed against five logical diagnostic indicators, five categories of treating doctor that are represented by four logical variables, one logical 'era' indicator, an integer age variable and a logical indicator of whether the patient has a certain blood marker.
The random effects variable used in the glmer model is a doctor ID, which will be one of the 59 treating doctors in the study. That was introduced because the reviewer was concerned about particular doctor effects that would cause the estimates of coefficient and their standard errors for the doctor category indicators to be inaccurate or biased because of varying propensities of individual doctors to give the prescription that is the Script result variable.
In setting up the model I had tried to follow the approach in Example 2 that is set out on this UCLA page
(http://www.ats.ucla.edu/stat/r/dae/melogit.htm), which has a fixed effects variable 'Experience' that is a property of a doctor, and a random effects variable DID (Doctor ID).
I speculated that perhaps the effect of the random effects variable disappeared if there were too many other explanatory variables. To test this out, I ran a slimmed-down model with only the four doctor category variables in it. That gave me a meaningful random effect variance. I then added more variables in a series of steps and found sure enough that the random effect variance decreased slightly with each added explanatory variable. Then it suddenly dived with the addition of a further explanatory variable (Btype1indic) from 0.08877 to 4e-14. I added one more explanatory variable (age_int, the integer age variable) and the variance went to zero.
Is it the case that the impact of a random effects variable, and its variance, decreases with the addition of each other fixed-effects variable to the model?
Why does the impact eventually go to zero, rather than just continuing to get smaller?
Should I be concerned that the fact that it goes to zero means I have done something wrong?
I attach below the output from my diagnostic runs. It shows the glmer output summaries as I add more variables at each step. I've inserted the marker '+++++++++++++++++++++++++++++++++++++' next to the random effects variances to make them easier to spot and compare. The output includes a glm run at the end. It can be seen that the estimates and standard errors of the p-values are identical to those in the glmer run. The AIC is close but not identical, at 492.83 vs 494.8. I don't know what to make of that.
Thank you very much for any help that anybody can provide.
[1] "New mixed effects regression ####################################################################################"
Generalized linear mixed model fit by maximum likelihood (Laplace Approximation) ['glmerMod']
Family: binomial ( logit )
Formula: Script ~ DrIsCat1 + DrIsCat2 + DrIsCat3 + DrIsCat4 + (1 | DrID)
Data: datatable
AIC BIC logLik deviance df.resid
635.9 665.0 -311.9 623.9 947
Scaled residuals:
Min 1Q Median 3Q Max
-0.4896 -0.3776 -0.3099 -0.2397 4.4144
Random effects:
Groups Name Variance Std.Dev.
DrID (Intercept) 0.1006 0.3172 +++++++++++++++++++++++++++++++++++++
Number of obs: 953, groups: DrID, 59
Fixed effects:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -2.8762 0.3042 -9.454 < 2e-16 ***
DrIsCat1TRUE 1.0657 0.3636 2.931 0.00338 **
DrIsCat2TRUE 1.3271 0.4152 3.197 0.00139 **
DrIsCat3TRUE 1.0939 0.6335 1.727 0.08423 .
DrIsCat4TRUE 0.6739 0.3839 1.755 0.07920 .
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Correlation of Fixed Effects:
(Intr) DIC1TR DIC2TR DIC3TR
DrIsCt1TRUE -0.815
DrIsCt2TRUE -0.680 0.564
DrIsCt3TRUE -0.468 0.389 0.324
DrIsCt4TRUE -0.772 0.642 0.534 0.368
[1] "New mixed effects regression ####################################################################################"
Generalized linear mixed model fit by maximum likelihood (Laplace Approximation) ['glmerMod']
Family: binomial ( logit )
Formula: Script ~ DrIsCat1 + DrIsCat2 + DrIsCat3 + DrIsCat4 + diag1 + diag2 + diag3 + diag4 + diag5 + (1 | DrID)
Data: datatable
AIC BIC logLik deviance df.resid
552.5 605.9 -265.2 530.5 942
Scaled residuals:
Min 1Q Median 3Q Max
-2.8315 -0.3275 -0.2255 -0.1439 9.9103
Random effects:
Groups Name Variance Std.Dev.
DrID (Intercept) 0.09085 0.3014 +++++++++++++++++++++++++++++++++++++
Number of obs: 953, groups: DrID, 59
Fixed effects:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -3.7230 0.4197 -8.871 < 2e-16 ***
DrIsCat1TRUE 1.4321 0.3835 3.735 0.000188 ***
DrIsCat2TRUE 1.6727 0.4476 3.737 0.000186 ***
DrIsCat3TRUE 1.5435 0.6737 2.291 0.021957 *
DrIsCat4TRUE 0.8032 0.4084 1.967 0.049232 *
diag1TRUE 1.3456 0.3149 4.273 1.93e-05 ***
diag2TRUE 0.7685 0.4147 1.853 0.063834 .
diag3TRUE -0.8395 0.3736 -2.247 0.024635 *
diag4TRUE 1.6516 0.3897 4.238 2.25e-05 ***
diag5TRUE 2.6146 0.5224 5.005 5.59e-07 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Correlation of Fixed Effects:
(Intr) DIC1TR DIC2TR DIC3TR DIC4TR d1TRUE d2TRUE d3TRUE d4TRUE
DrIsCt1TRUE -0.694
DrIsCt2TRUE -0.611 0.559
DrIsCt3TRUE -0.434 0.398 0.327
DrIsCt4TRUE -0.588 0.631 0.500 0.367
diag1TRUE -0.595 0.091 0.181 0.093 -0.040
diag2TRUE -0.429 0.077 0.080 0.126 0.072 0.488
diag3TRUE -0.419 0.034 0.078 0.065 -0.002 0.531 0.390
diag4TRUE -0.219 0.117 0.052 0.065 0.095 0.111 0.056 -0.053
diag5TRUE -0.230 0.149 0.137 0.110 0.101 0.166 0.070 -0.034 -0.081
convergence code: 0
Model failed to converge with max|grad| = 0.00293761 (tol = 0.001, component 1)
[1] "New mixed effects regression ####################################################################################"
Generalized linear mixed model fit by maximum likelihood (Laplace Approximation) ['glmerMod']
Family: binomial ( logit )
Formula: Script ~ DrIsCat1 + DrIsCat2 + DrIsCat3 + DrIsCat4 + diag1 + diag2 + diag3 + diag4 + diag5 + old_era + (1 | DrID)
Data: datatable
AIC BIC logLik deviance df.resid
554.1 612.4 -265.0 530.1 941
Scaled residuals:
Min 1Q Median 3Q Max
-2.9335 -0.3276 -0.2212 -0.1427 10.1394
Random effects:
Groups Name Variance Std.Dev.
DrID (Intercept) 0.08877 0.2979 +++++++++++++++++++++++++++++++++++++
Number of obs: 953, groups: DrID, 59
Fixed effects:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -3.8135 0.4459 -8.552 < 2e-16 ***
DrIsCat1TRUE 1.4295 0.3830 3.733 0.000189 ***
DrIsCat2TRUE 1.6774 0.4476 3.747 0.000179 ***
DrIsCat3TRUE 1.5380 0.6727 2.286 0.022243 *
DrIsCat4TRUE 0.8001 0.4080 1.961 0.049864 *
diag1TRUE 1.3563 0.3159 4.294 1.76e-05 ***
diag2TRUE 0.7908 0.4162 1.900 0.057388 .
diag3TRUE -0.7997 0.3790 -2.110 0.034831 *
diag4TRUE 1.6524 0.3902 4.235 2.28e-05 ***
diag5TRUE 2.6130 0.5232 4.994 5.92e-07 ***
old_eraTRUE 0.1503 0.2354 0.638 0.523278
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Correlation of Fixed Effects:
(Intr) DIC1TR DIC2TR DIC3TR DIC4TR d1TRUE d2TRUE d3TRUE d4TRUE d5TRUE
DrIsCt1TRUE -0.653
DrIsCt2TRUE -0.587 0.560
DrIsCt3TRUE -0.408 0.398 0.328
DrIsCt4TRUE -0.552 0.631 0.500 0.367
diag1TRUE -0.582 0.094 0.187 0.096 -0.038
diag2TRUE -0.432 0.078 0.085 0.127 0.072 0.491
diag3TRUE -0.445 0.032 0.083 0.064 -0.003 0.532 0.398
diag4TRUE -0.211 0.120 0.056 0.068 0.098 0.112 0.053 -0.051
diag5TRUE -0.220 0.151 0.139 0.111 0.103 0.167 0.071 -0.034 -0.080
old_eraTRUE -0.332 -0.005 0.024 -0.009 -0.008 0.060 0.085 0.165 0.010 0.002
[1] "New mixed effects regression ####################################################################################"
Generalized linear mixed model fit by maximum likelihood (Laplace Approximation) ['glmerMod']
Family: binomial ( logit )
Formula: Script ~ DrIsCat1 + DrIsCat2 + DrIsCat3 + DrIsCat4 + diag1 + diag2 + diag3 + diag4 + diag5 + old_era + Btyp1indic + (1 | DrID)
Data: datatable
AIC BIC logLik deviance df.resid
498.0 561.1 -236.0 472.0 940
Scaled residuals:
Min 1Q Median 3Q Max
-4.7047 -0.3053 -0.1857 -0.1201 7.6007
Random effects:
Groups Name Variance Std.Dev.
DrID (Intercept) 4e-14 2e-07 +++++++++++++++++++++++++++++++++++++
Number of obs: 953, groups: DrID, 59
Fixed effects:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -4.5004 0.4487 -10.029 < 2e-16 ***
DrIsCat1TRUE 1.7184 0.3462 4.963 6.93e-07 ***
DrIsCat2TRUE 1.8158 0.4261 4.262 2.03e-05 ***
DrIsCat3TRUE 1.8923 0.6694 2.827 0.004699 **
DrIsCat4TRUE 1.0884 0.3813 2.855 0.004308 **
diag1TRUE 1.1332 0.3344 3.389 0.000702 ***
diag2TRUE 0.8158 0.4433 1.840 0.065732 .
diag3TRUE -0.6445 0.3931 -1.640 0.101083
diag4TRUE 0.9052 0.4192 2.160 0.030808 *
diag5TRUE 2.4063 0.5729 4.200 2.67e-05 ***
old_eraTRUE 0.3117 0.2493 1.250 0.211165
Btyp1indicTRUE 2.0701 0.2692 7.690 1.47e-14 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Correlation of Fixed Effects:
(Intr) DIC1TR DIC2TR DIC3TR DIC4TR d1TRUE d2TRUE d3TRUE d4TRUE d5TRUE o_TRUE
DrIsCt1TRUE -0.618
DrIsCt2TRUE -0.543 0.526
DrIsCt3TRUE -0.368 0.359 0.284
DrIsCt4TRUE -0.479 0.581 0.437 0.312
diag1TRUE -0.588 0.108 0.220 0.098 -0.047
diag2TRUE -0.477 0.110 0.101 0.140 0.092 0.495
diag3TRUE -0.484 0.038 0.122 0.060 -0.022 0.550 0.408
diag4TRUE -0.141 0.080 0.042 0.036 0.093 0.113 0.087 -0.007
diag5TRUE -0.240 0.185 0.142 0.122 0.124 0.161 0.066 -0.046 -0.094
old_eraTRUE -0.380 0.003 0.013 0.012 0.001 0.053 0.090 0.159 -0.013 0.052
Btyp1ndTRUE -0.359 0.236 0.128 0.139 0.174 -0.041 0.028 0.035 -0.193 0.028 0.114
[1] "New mixed effects regression ####################################################################################"
Generalized linear mixed model fit by maximum likelihood (Laplace Approximation) ['glmerMod']
Family: binomial ( logit )
Formula: Script ~ DrIsCat1 + DrIsCat2 + DrIsCat3 + DrIsCat4 + diag1 + diag2 + diag3 + diag4 + diag5 + old_era + Btyp1indic + age + (1 | DrID)
Data: datatable
AIC BIC logLik deviance df.resid
494.8 562.9 -233.4 466.8 939
Scaled residuals:
Min 1Q Median 3Q Max
-4.9550 -0.3038 -0.1876 -0.1120 8.9225
Random effects:
Groups Name Variance Std.Dev.
DrID (Intercept) 0 0 +++++++++++++++++++++++++++++++++++++
Number of obs: 953, groups: DrID, 59
Fixed effects:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -5.33613 0.61104 -8.733 < 2e-16 ***
DrIsCat1TRUE 1.42426 0.36618 3.890 0.00010 ***
DrIsCat2TRUE 1.40064 0.46032 3.043 0.00234 **
DrIsCat3TRUE 1.56801 0.68662 2.284 0.02239 *
DrIsCat4TRUE 0.81990 0.39885 2.056 0.03982 *
diag1TRUE 1.10406 0.33784 3.268 0.00108 **
diag2TRUE 0.80801 0.44417 1.819 0.06889 .
diag3TRUE -0.62696 0.39646 -1.581 0.11379
diag4TRUE 0.83663 0.42577 1.965 0.04942 *
diag5TRUE 2.37526 0.57865 4.105 4.05e-05 ***
old_eraTRUE 0.36109 0.25264 1.429 0.15293
Btyp1indicTRUE 2.05930 0.27075 7.606 2.83e-14 ***
age 0.08512 0.03932 2.165 0.03040 *
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
[1] "New glm regression ####################################################################################"
Call:
glm(formula = Script ~ DrIsCat1 + DrIsCat2 + DrIsCat3 + DrIsCat4 +
diag1 + diag2 + diag3 + diag4 + diag5 + old_era + Btyp1indic +
age, family = binomial("logit"), data = datatable)
Deviance Residuals:
Min 1Q Median 3Q Max
-2.5459 -0.4202 -0.2630 -0.1578 2.9630
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -5.33613 0.61103 -8.733 < 2e-16 ***
DrIsCat1TRUE 1.42426 0.36617 3.890 0.00010 ***
DrIsCat2TRUE 1.40064 0.46031 3.043 0.00234 **
DrIsCat3TRUE 1.56801 0.68662 2.284 0.02239 *
DrIsCat4TRUE 0.81990 0.39885 2.056 0.03981 *
diag1TRUE 1.10406 0.33784 3.268 0.00108 **
diag2TRUE 0.80801 0.44416 1.819 0.06889 .
diag3TRUE -0.62696 0.39646 -1.581 0.11379
diag4TRUE 0.83663 0.42577 1.965 0.04942 *
diag5TRUE 2.37526 0.57865 4.105 4.05e-05 ***
old_eraTRUE 0.36109 0.25264 1.429 0.15293
Btyp1indicTRUE 2.05930 0.27075 7.606 2.83e-14 ***
age 0.08512 0.03932 2.165 0.03040 *
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 644.28 on 952 degrees of freedom
Residual deviance: 466.83 on 940 degrees of freedom
AIC: 492.83
Number of Fisher Scoring iterations: 6
Best Answer
It's hard to comment on your overall goal without data. But, the reason your random effects go down with adding more predictors is a consequence of explaining away doctor variance. For example, suppose that one of your doctors treats a large number of end-stage renal disease patients, who are very sick. Without end-stage renal disease as a fixed effect, that doctors random effect will likely be large to explain the outcomes of treating those sick patients. Whereas, if you add the fixed effect, you'll correctly adjust for these sick patients, and therefore reduce that doctors random effect.
Most likely you just don't have enough data to form more accurate predictions of your random effects.