Let's say I've fitted a 2 level model with glmer like this:
data.model <- glmer(y ~ 1 + level1.var11 + level2.var21 + (1 | ID), family = binomial(link = "logit"), data = dataset)
where the level-2 grouping is done by ID, level1.var11 is a level-1 predictor, and level2.var21 is a level-2 predictor.
For example, let's say that the level-2 units are schools, and the level-1 units are students in these schools. (I use the notation used by Raudenbush and Byrk in the book Hierarchical Linear Models Second edition.)
Let's say the level-1 predictor is student GPA and the level-2 predictor is SECTOR that is whether the school is public or private. The response variable is 1 if a student repeats a class and 0 if the student does not repeat the class.
The combined model in this case is:
$\eta_{ij} = \gamma_{00} + \gamma_{10}Student\_GPA_{ij} + \gamma_{01}SECTOR_{j} + u_{0j}$
I have fixed intercept, $\gamma_{00}$, and fixed slopes, $\gamma_{10}$ and $\gamma_{01}$, and random effect (the random intercept) for each school, $u_{0j}$.
$\eta_{ij}$ is the log odds for student $i$ in school $j$ to repeat a class.
Using this model, I can predict the probability, $p_{ij}$, for each student repeating a class. (I can decide to use the random effects or not. Lets say I don't want to use the random effects.)
Now I want to know the probability $p_{ij}$ that a student belonging to school $j$ will repeat the class. My idea is to predict the probabilities for each student based on the model I created and then calculate the average probability for each school.
$\overline{p}_{.j} = \frac{\sum_{i = 1}^{n_{j}}p_{ij}}{n_j}$
I am not sure if this is the right approach. Am I missing something important?
I know that I can use the method predict
from the package lme4
for prediction at level-1 like this:
predict(data.model, newdata = data, REform = NA, type = "response", allow.new.levels = TRUE)
I wanna know how can I make predictions at level-2 using the model that I fitted with level-1 and level-2 predictors. Should I just average the level-1 prediction for each group or is there a better approach?
Best Answer
I'm not 100% sure I know what you mean by the levels: according to the usual way I've seen this terminology used, level 1 would be "above" level 2, meaning the level of the whole population, so I'm not sure how we can have a "level-1 predictor". Anyway, I'm not sure I need to know, since you can set the fixed effects however you like within
newdata
. I think the answer to your question is in the help:so
ReForm=NA
gives population-level predictions (i.e. predictions based on not knowing whatID
is being predicted); since you have only a single random effect, using eitherReForm=~ID
orReForm=NULL
will give predictions conditional on specifiedID
s. (I see you have setallow.new.levels=TRUE
; I'm not sure how that will work with predicting at theID
level ...)With the development version of
lme4
:(I'm not sure, but the capitalization of
ReForm
may have changed in the development version -- be careful.)update: OK, you want to know the average probability of a student at school $j$ repeating a class. I think your approach is reasonable (the answer should be similar to the observed value, although in general it should be a shrinkage estimator [i.e. closer to the overall average). You might also want to consider calculating the probability that an average student at school $j$ would repeat, in which case you would first average the predictors ...