Solved – Level-2 predictions with lme4/glmer model

glmmlme4-nlmemixed modelmultilevel-analysis

Let's say I've fitted a 2 level model with glmer like this:

data.model <- glmer(y ~ 1 + level1.var11 + level2.var21 + (1 | ID), family = binomial(link =  "logit"), data = dataset)

where the level-2 grouping is done by ID, level1.var11 is a level-1 predictor, and level2.var21 is a level-2 predictor.

For example, let's say that the level-2 units are schools, and the level-1 units are students in these schools. (I use the notation used by Raudenbush and Byrk in the book Hierarchical Linear Models Second edition.)
Let's say the level-1 predictor is student GPA and the level-2 predictor is SECTOR that is whether the school is public or private. The response variable is 1 if a student repeats a class and 0 if the student does not repeat the class.
The combined model in this case is:

$\eta_{ij} = \gamma_{00} + \gamma_{10}Student\_GPA_{ij} + \gamma_{01}SECTOR_{j} + u_{0j}$

I have fixed intercept, $\gamma_{00}$, and fixed slopes, $\gamma_{10}$ and $\gamma_{01}$, and random effect (the random intercept) for each school, $u_{0j}$.
$\eta_{ij}$ is the log odds for student $i$ in school $j$ to repeat a class.

Using this model, I can predict the probability, $p_{ij}$, for each student repeating a class. (I can decide to use the random effects or not. Lets say I don't want to use the random effects.)

Now I want to know the probability $p_{ij}$ that a student belonging to school $j$ will repeat the class. My idea is to predict the probabilities for each student based on the model I created and then calculate the average probability for each school.

$\overline{p}_{.j} = \frac{\sum_{i = 1}^{n_{j}}p_{ij}}{n_j}$

I am not sure if this is the right approach. Am I missing something important?

I know that I can use the method predict from the package lme4 for prediction at level-1 like this:

predict(data.model, newdata = data, REform = NA, type = "response", allow.new.levels = TRUE)

I wanna know how can I make predictions at level-2 using the model that I fitted with level-1 and level-2 predictors. Should I just average the level-1 prediction for each group or is there a better approach?

Best Answer

I'm not 100% sure I know what you mean by the levels: according to the usual way I've seen this terminology used, level 1 would be "above" level 2, meaning the level of the whole population, so I'm not sure how we can have a "level-1 predictor". Anyway, I'm not sure I need to know, since you can set the fixed effects however you like within newdata. I think the answer to your question is in the help:

ReForm: formula for random effects to condition on. If ‘NULL’, include all random effects; if ‘NA’ or ‘~0’, include no random effects

so ReForm=NA gives population-level predictions (i.e. predictions based on not knowing what ID is being predicted); since you have only a single random effect, using either ReForm=~ID or ReForm=NULL will give predictions conditional on specified IDs. (I see you have set allow.new.levels=TRUE; I'm not sure how that will work with predicting at the ID level ...)

With the development version of lme4:

d <- data.frame(f=factor(rep(LETTERS[1:20],each=30)))
library(lme4)
d$y <- simulate(~1+(1|f),family="gaussian",newdata=d,
    newparams=list(beta=0,theta=1,sigma=0.1),seed=102)[[1]]
m <- lmer(y~1+(1|f),data=d)
newdata <- data.frame(f=factor(LETTERS[1:20]))
predict(m,newdata=newdata,ReForm=NA)  ## all identical
predict(m,newdata=newdata,ReForm=NULL)  ## different by f

(I'm not sure, but the capitalization of ReForm may have changed in the development version -- be careful.)

update: OK, you want to know the average probability of a student at school $j$ repeating a class. I think your approach is reasonable (the answer should be similar to the observed value, although in general it should be a shrinkage estimator [i.e. closer to the overall average). You might also want to consider calculating the probability that an average student at school $j$ would repeat, in which case you would first average the predictors ...