Solved – Adding 2nd level variable into Multi-level Modelling in Stata

multilevel-analysisstata

I'm used to HLM 7 software and now I'd like to switch to Stata, for multilevel modelling (xtmixed).

To give an example imagine I have students(level1) nested within schools (level2).
In HLM I can easily add a second level variable (for instance schools beauty) choosing the equation of a coefficient (or intercept) which is at level 2.

Imagine I want to include the effect of school beauty in B0 (intercept).
How can I do that in Stata?

Best Answer

I found the solution here:

http://www.iub.edu/~statmath/stat/all/hlm/hlm.pdf

The answer is:

whereas HLM requires two separate data les (one corresponding to each level), SPSS, Stata, SAS, and R rely on only a single file. The level-2 observations are common to each case within the same macro-unit, so that if there are 50 students in one school the corresponding school-level score appears 50 times. Each program also requires an id variable identifying the group membership of each individual.

So, adding a second level variable to the model, like the school sector (Sector), can be easily done via the xtmixed command. For instance:

xtmixed MatGrade Intelligence Sector || SchoolId:

Where "MatGrade" is the outcome variable, "Intelligence" is a level 1 variable reflecting the intelligence of the scholars. Scholars are nested within schools.

Related Solutions

Solved – Level-2 predictions with lme4/glmer model

I'm not 100% sure I know what you mean by the levels: according to the usual way I've seen this terminology used, level 1 would be "above" level 2, meaning the level of the whole population, so I'm not sure how we can have a "level-1 predictor". Anyway, I'm not sure I need to know, since you can set the fixed effects however you like within newdata. I think the answer to your question is in the help:

ReForm: formula for random effects to condition on. If ‘NULL’, include all random effects; if ‘NA’ or ‘~0’, include no random effects

so ReForm=NA gives population-level predictions (i.e. predictions based on not knowing what ID is being predicted); since you have only a single random effect, using either ReForm=~ID or ReForm=NULL will give predictions conditional on specified IDs. (I see you have set allow.new.levels=TRUE; I'm not sure how that will work with predicting at the ID level ...)

With the development version of lme4:

d <- data.frame(f=factor(rep(LETTERS[1:20],each=30)))
library(lme4)
d$y <- simulate(~1+(1|f),family="gaussian",newdata=d,
    newparams=list(beta=0,theta=1,sigma=0.1),seed=102)[[1]]
m <- lmer(y~1+(1|f),data=d)
newdata <- data.frame(f=factor(LETTERS[1:20]))
predict(m,newdata=newdata,ReForm=NA)  ## all identical
predict(m,newdata=newdata,ReForm=NULL)  ## different by f

(I'm not sure, but the capitalization of ReForm may have changed in the development version -- be careful.)

update: OK, you want to know the average probability of a student at school $j$ repeating a class. I think your approach is reasonable (the answer should be similar to the observed value, although in general it should be a shrinkage estimator [i.e. closer to the overall average). You might also want to consider calculating the probability that an average student at school $j$ would repeat, in which case you would first average the predictors ...

Solved – Significance of variance components in Stata output

Concerning the display of the results, specify the option variance if you prefer variances over standard deviations.

Concerning the significance, you can run an OLS of the dependent variable on all independent variables with exception of the level 2 identifier (i.e. schools), using the command regress. Store the estimates you obtain through estimates store [name1].

Then estimate your multilevel model using xtmixed and again store the estimates by estimates store [name2].

The difference between these models is the random intercept you allowed in the multilevel estimation but not in the OLS estimation; hence testing whether the unconstrained model performs better is equivalent to testing significance of the random intercept. lrtest [name1] [name2], force will do this for you. You will need to specify the force option; otherwise Stata deems the test invalid.

Best Answer

Related Solutions

Solved – Level-2 predictions with lme4/glmer model

Solved – Significance of variance components in Stata output

Related Question