Solved – Partitioning variance within a level in a 3-level mixed effects model

intraclass-correlationlme4-nlmemixed modelrvariance

I have a 3-level repeated measures model setup observations (level-1) nested within patients (level-2) nested within providers (level-3). One of my questions is the percentage of variance explained at level-2 versus level-3 across the dependent variable (outcome). I'm running these with the intra-class correlation coefficient ICC (p).

I am trying to fit a model where I partition variance within my level-2 and level-3 random effects based on client's racial status (white/non-white). I can subset my data and run three separate models and measure the ICCs individually, but I am hoping to do this in one model, which will also allow me to test significance between variance estimates by leaving one out and running log-likelihood tests, e.g. anova(model1,model2).

Here is my baseline (random intercept) model for calculating ICC on the complete sample: (note there is no cross-nesting and each patient has their own provider, no two providers see the same patient)

model1 <- lmer(outcome ~ (1|id) + (1|provider), data = dat)

Rather than subset the data and run three models. I want to run one model and measure both client and therapist variability in the outcome measure with white versus non-white patients. I ran this model:

model2 <- lmer(outcome ~ rem + (1|id/rem) + (1|provider/rem), data = dat)

Note: REM (racial-ethnic minority) is coded as a factor, "White" and "REM". I have about 1000 White patients, 500 REM patients, and 39 providers (each provider has seen at least 5 patients).

My understanding is that without the (1|…) in both random effects the model will estimate covariances, which I do not want as I will not be able to estimate ICCs.

Model 2 output is as follows:

    REML criterion at convergence: 13694.2

Scaled residuals: 
     Min       1Q   Median       3Q      Max 
-10.2143  -0.3806   0.0751   0.4585   5.6865 

Random effects:
 Groups          Name        Variance Std.Dev.
 rem:id          (Intercept) 0.21617  0.46494 
 id              (Intercept) 0.36515  0.60428 
 rem:provider    (Intercept) 0.05808  0.24100 
 provider        (Intercept) 0.00515  0.07176 
 Residual                    0.17966  0.42386 
Number of obs: 8606, groups:  rem:id, 1504; id, 1504; rem:provider, 78; provider, 39

Fixed effects:
              Estimate Std. Error         df t value Pr(>|t|)    
(Intercept)  5.9928132  0.0483799 66.1700000 123.870   <2e-16 ***
remrem      -0.0005091  0.0740942 52.5400000  -0.007    0.995    
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Correlation of Fixed Effects:
       (Intr)
remrem -0.615

Central Question:

Is this the right approach to meet my aim of partitioning variance within the two levels of REM status (White/REM) and across the two levels of analysis (patient & provider)? In the end, I am hoping to be able to calculate variance explained across all 3 levels (patient, provider residual) by 3 samples (total sample, white, rem) for a total of 9 ICCs.

I struggle to interpret the Groups under Random effects… e.g., rem:id versus id … It looks like there are two intercepts for my level-1 variables, but I do not know if rem:id is across rem status and id is across all patients. I am looking for something like white:id and rem:id to estimate my ICCs.

Secondary Model Results Questions:

The number of observations looks mostly right, however, for rem:provider the number is double the number of providers in my sample. And rem:id is equal to total number of patients as well as id. In my sample White is about 1000 and REM is about 500. Is this indicative of an error in my model, or is it not assessing these two groups separately?
My residual variance across both models looks to be forced into one variance estimate. Is there a way to partition this variance across level-1 and level-2? This will be critical in calculating ICCs.
Do I need to include rem as a fixed effect as well? How does including versus not including impact variance estimation?

The best response I have found in answering how to specify my random effects is Ben Boker's response here but I am still having troubles interpreting model results and pulling variance estimates to calculate ICCs.

Best Answer

I figured out a workaround, although the residual variance is not parsable in lme4 (as far as I know). One workaround is to create separate numeric dummy variables for white and rem. I made these white_dummy = 1 and another column rem_dummy = 1. Then I ran the following model:

model3 <- lmer(outcome ~ rem + 
 (0 + white_dummy|id) +
 (0 + rem_dummy|id) +
 (0 + white_dummy|provider) +
 (0 + rem_dummy|provider), 
 data = filter(dat, rem == 1 | rem == 0))

and this creates the following random effects output:

Random effects:
 Groups        Name         Variance Std.Dev. Corr
 id            white_dummy0 0.19631  0.4431       
               white_dummy1 0.39760  0.6306   0.06
 id.1          rem_dummy0   0.18381  0.4287       
               rem_dummy1   0.39400  0.6277   0.24
 provider_1    white_dummy0 0.06192  0.2488       
               white_dummy1 0.01334  0.1155   0.11
 provider_1.1  rem_dummy0   0.02774  0.1666       
               rem_dummy1   0.03867  0.1966   0.00
Residual                    0.17943  0.4236       
Number of obs: 8606, groups:  id, 1504; therapist_1, 39

However! The model failed to converge and this is likely due to the reduced number of observations at the therapist/rem level and the variance estimates are near zero. I am going to go the bayesian MCMCglmm route to attempt to fit this model.

Related Solutions

Solved – Three level mixed-effects (crossed and nested) model in stata

It is difficult to expect that you can fit a nice normal distribution to 9 points, and that's what you are trying to achieve with the doctors and patients. Now, fitting a normal distribution with 2 points for stage is just impossible. I think Stata may not even be able to identify the variance parameter for stage here. You have to model that as a fixed effect. I would go as far as to say that you would want to treat everything as fixed effects, and just run

anova mortality_est doctorsid patient stage

To me, is an example where the complicated model is unnecessary (the perfect is the enemy of the good), and whoever is forcing you to fit the multilevel model to this data must be pretty evil (reviewer, adviser, instructor...).

If your ultimate goal is some sort of doctors' ICC or interrater agreement something, then I could see the point of the exercise, but again with just 9 observations, your variance component estimate will be woefully unstable, and effectively your CI for the ICC would look like (-0.1,0.8) or something like that. In the best case scenario of homoskedastic normal data, it is a (scaled) chi-square with 7 degrees of freedom (I think), so it has a mean of 7, a variance of 14, a standard deviation of 3.74, skewness of 1.07, and an excess kurtosis of 1.71; few statisticians would be happy to bet their publications on a $\chi^2_7$. In the worst case with doctor-patient interactions and heteroskedasticity, you may end up with something closer to chi-square with 1 or 2 d.f., and that isn't something you can really work with as a variance estimate.

Solved – Interpreting intercepts in mixed effect model with categorical predictors

The difference of the outputs from two models dues to the different parameterization of the effect of group.

In the first model, you allowed the intercept, you model would be (disregard the random effect part) y=beta0+beta1*X+e, where X=1 for phone and 0 for computer. So in the output, beta0 should be the effect due to computer (when X=0) and beta0+beta1 is the effect due to phone (when X=1). The difference between these two effects are beta1 (effect due to computer).

While in the second model, the intercept is omitted. So the model will be (also disregard the random effect part) yi=alpha_i+e, where i=1,2 (i=1 for the computer group and 2 for the phone group) Thus the two estimates for the fixed effects are for computer and phone respectively. You can see the difference between the two estimates (i.e. phone-computer) is the beta1 in model 1 (In model one the computer group is treated as reference).

And for the significance part, from model 1, the group effect is not significant (i.e. the effect from phone compared to computer is not significantly different on the outcome.)

Best Answer

Related Solutions

Solved – Three level mixed-effects (crossed and nested) model in stata

Solved – Interpreting intercepts in mixed effect model with categorical predictors

Related Question