Solved – Confidence intervals on differences in choices in a GEE framework: methods and alternatives

confidence intervalestimationmixed modelpsychometricsrandom-effects-model

Say we have 5 items, and people are asked which item they like. Multiple answers are possible, but also no answer is possible. The people are categorized according to factors like gender, age, and so on. One possible approach to analyze the differences between genders, age groups and so on is making use of the Generalized Estimating Equations.

I thus construct a dataset with a binary variable indicating whether the item was liked or not, and as predictor variables I have the items, the person id, the age,… i.e. :

Liked   Item   Person ...
0       1      1
1       2      1
0       3      1
0       4      1
1       5      1
1       1      2
...

Then I apply a model with following form :
$$Liked = Item + Gender + Item*Gender + Age + Item*Age + … $$
with Person as random factor (called id in some applications) and a logit link function.

Now I like to give confidence intervals on the conditional fractions, i.e. the confidence interval of the fractions males and females that liked a particular item, corrected for age differences and the likes. I know I could use the estimated coefficients to get the results I want, but I'm a bit lost in how to do that. I can calculate the estimated odds, but I'm not sure on the standard error (SE) on those odds based on the SE of the coefficients. I'm not sure on how to deal with the random component of the variance for example.

So:

1) Any pointers on how to calculate that SE from the SE of the coefficients?

2) Any alternatives for an approach? I've been thinking about mixed models, but a colleague directed me to GEE as more appropriate for these data. Your ideas?


Edit : for practical pointers, I'm using geepack in R for this. I tried effect(), but the option se.fit=T is not implemented. In any case, that would give the SE for every observation, which is not what I'm interested in.

Best Answer

Well, the gee package includes facilities for fitting GEE and gee() return asymptotic and robust SE. I never used the geepack package. From what I saw in the online example, output seems to resemble more or less that of gee. To compute $100(1-\alpha)$ CIs for your main effects (e.g. gender), why not use the robust SE (in the following I will assume it is extracted from, say summary(gee.fit), and stored in a variable rob.se)? I suppose that

exp(coef(gee.fit)["gender"]+c(-1,1)*rob.se*qnorm(0.975))

should yield 95% CIs expressed on the odds scale.

Now, in fact I rarely use GEE except when I am working with binary endpoints in longitudinal studies, because it's easy to pass or estimate a given working correlation matrix. In the case you summarize here, I would rather rely on an IRT model for dichotomous items (see the psychometrics task view), or (it is quite the same in fact) a mixed-effects GLM such as the one that is proposed in the lme4 package, from Doug Bates. For study like yours, as you said, subjects will be considered as random effects, and your other covariates enter the model as fixed effects; the response is the 0/1 rating on each item (which enter the model as well). Then you will get 95% CI for fixed effects, either from the SE computed as sqrt(diag(vcov(glmm.fit))) or as read in summary(glmm.fit), or using confint() together with an lmList object. Doug Bates gave nice illustrations in the following two paper/handout:

There is also a discussion about profiling lmer fits (based on profile deviance) to investigate variability in fixed effects, but I didn't investigate that point. I think it is still in section 1.5 of Doug's draft on mixed models. There are a lot of discussion about computing SE and CI for GLMM as implemented in the lme4 package (whose interface differs from the previous nlme package), so that you will easily find other interesting threads after googling about that.

It's not clear to me why GEE would have to be preferred in this particular case. Maybe, look at the R translation of Agresti's book by Laura Thompson, R (and S-PLUS) Manual to Accompany Agresti's Categorical Data.


Update:

I just realized that the above solution would only work if you're interested in getting a confidence interval for the gender effect alone. If it is the interaction item*gender that is of concern, you have to model it explicitly in the GLMM (my second reference on Bates's has an example on how to do it with lmer).

Another solution is to use an explanatory IRT model, where you explicitly acknowledge the potential effect of person covariates, like gender or age, and consider fitting them within a Rasch model, for example. This is called a Latent Regression Rasch Model, and is fully described in de Boeck and Wilson's book, Explanatory item response models: a generalized linear and nonlinear approach (Springer, 2004), which you can read online on Google books (section 2.4). There are some facilities to fit this kind of model in Stata (see there). In R, we can mimic such model with a mixed-effects approach; a toy example would look something like

lmer(response ~ 0 + Age + Sex + item + (Sex|id), data=df, binomial)

if I remember correctly. I'm not sure whether the eRm allows to easily incorporate person covariates (because we need to construct a specific design matrix), but it may be worth checking out since it provides 95% CIs too.