I'm not 100% sure I know what you mean by the levels: according to the usual way I've seen this terminology used, level 1 would be "above" level 2, meaning the level of the whole population, so I'm not sure how we can have a "level-1 predictor". Anyway, I'm not sure I need to know, since you can set the fixed effects however you like within newdata
. I think the answer to your question is in the help:
ReForm: formula for random effects to condition on. If ‘NULL’,
include all random effects; if ‘NA’ or ‘~0’, include no
random effects
so ReForm=NA
gives population-level predictions (i.e. predictions based on not knowing what ID
is being predicted); since you have only a single random effect, using either ReForm=~ID
or ReForm=NULL
will give predictions conditional on specified ID
s. (I see you have set allow.new.levels=TRUE
; I'm not sure how that will work with predicting at the ID
level ...)
With the development version of lme4
:
d <- data.frame(f=factor(rep(LETTERS[1:20],each=30)))
library(lme4)
d$y <- simulate(~1+(1|f),family="gaussian",newdata=d,
newparams=list(beta=0,theta=1,sigma=0.1),seed=102)[[1]]
m <- lmer(y~1+(1|f),data=d)
newdata <- data.frame(f=factor(LETTERS[1:20]))
predict(m,newdata=newdata,ReForm=NA) ## all identical
predict(m,newdata=newdata,ReForm=NULL) ## different by f
(I'm not sure, but the capitalization of ReForm
may have changed in the development version -- be careful.)
update: OK, you want to know the average probability of a student at school $j$ repeating a class. I think your approach is reasonable (the answer should be similar to the observed value, although in general it should be a shrinkage estimator [i.e. closer to the overall average). You might also want to consider calculating the probability that an average student at school $j$ would repeat, in which case you would first average the predictors ...
Concerning the display of the results, specify the option variance
if you prefer variances over standard deviations.
Concerning the significance, you can run an OLS of the dependent variable on all independent variables with exception of the level 2 identifier (i.e. schools), using the command regress
.
Store the estimates you obtain through estimates store
[name1].
Then estimate your multilevel model using xtmixed
and again store the estimates by estimates store
[name2].
The difference between these models is the random intercept you allowed in the multilevel estimation but not in the OLS estimation; hence testing whether the unconstrained model performs better is equivalent to testing significance of the random intercept.
lrtest
[name1] [name2], force
will do this for you. You will need to specify the force
option; otherwise Stata deems the test invalid.
Best Answer
I found the solution here:
http://www.iub.edu/~statmath/stat/all/hlm/hlm.pdf
The answer is:
So, adding a second level variable to the model, like the school sector (Sector), can be easily done via the
xtmixed
command. For instance:Where "MatGrade" is the outcome variable, "Intelligence" is a level 1 variable reflecting the intelligence of the scholars. Scholars are nested within schools.