Well, the gee package includes facilities for fitting GEE and gee()
return asymptotic and robust SE. I never used the geepack package. From what I saw in the online example, output seems to resemble more or less that of gee
. To compute $100(1-\alpha)$ CIs for your main effects (e.g. gender), why not use the robust SE (in the following I will assume it is extracted from, say summary(gee.fit)
, and stored in a variable rob.se
)? I suppose that
exp(coef(gee.fit)["gender"]+c(-1,1)*rob.se*qnorm(0.975))
should yield 95% CIs expressed on the odds scale.
Now, in fact I rarely use GEE except when I am working with binary endpoints in longitudinal studies, because it's easy to pass or estimate a given working correlation matrix. In the case you summarize here, I would rather rely on an IRT model for dichotomous items (see the psychometrics task view), or (it is quite the same in fact) a mixed-effects GLM such as the one that is proposed in the lme4 package, from Doug Bates. For study like yours, as you said, subjects will be considered as random effects, and your other covariates enter the model as fixed effects; the response is the 0/1 rating on each item (which enter the model as well). Then you will get 95% CI for fixed effects, either from the SE computed as sqrt(diag(vcov(glmm.fit)))
or as read in summary(glmm.fit)
, or using confint()
together with an lmList
object. Doug Bates gave nice illustrations in the following two paper/handout:
There is also a discussion about profiling lmer
fits (based on profile deviance) to investigate variability in fixed effects, but I didn't investigate that point. I think it is still in section 1.5 of Doug's draft on mixed models. There are a lot of discussion about computing SE and CI for GLMM as implemented in the lme4
package (whose interface differs from the previous nlme
package), so that you will easily find other interesting threads after googling about that.
It's not clear to me why GEE would have to be preferred in this particular case. Maybe, look at the R translation of Agresti's book by Laura Thompson, R (and S-PLUS) Manual to Accompany Agresti's Categorical Data.
Update:
I just realized that the above solution would only work if you're interested in getting a confidence interval for the gender effect alone. If it is the interaction item*gender that is of concern, you have to model it explicitly in the GLMM (my second reference on Bates's has an example on how to do it with lmer
).
Another solution is to use an explanatory IRT model, where you explicitly acknowledge the potential effect of person covariates, like gender or age, and consider fitting them within a Rasch model, for example. This is called a Latent Regression Rasch Model, and is fully described in de Boeck and Wilson's book, Explanatory item response models: a generalized linear and nonlinear approach (Springer, 2004), which you can read online on Google books (section 2.4). There are some facilities to fit this kind of model in Stata (see there). In R, we can mimic such model with a mixed-effects approach; a toy example would look something like
lmer(response ~ 0 + Age + Sex + item + (Sex|id), data=df, binomial)
if I remember correctly. I'm not sure whether the eRm allows to easily incorporate person covariates (because we need to construct a specific design matrix), but it may be worth checking out since it provides 95% CIs too.
It sounds like the statement you are quoting is not about confidence intervals but rather about the probability content within k standard deviations of the mean for k=1, 2, 3. This result plays a role in constructing confidence intervals for the mean of a normal sample though, because if Xi iid N(μ,σ) the sample mean Xb is N(μ,σ/√n)
So for known σ [Xb-2σ/√n,Xb+2σ/√n] is approximately a 95% confidence interval for μ (actually 95.4%). When σ is unknown replacing σ with the sample standard deviation s will still give an approximate 95% confidence interval for μ when n is large. For small n the exact distribution to construct the confidence interval is student t with n-1 degrees of freedom. So 2 should be replaced by the (larger) appropriate percentile from the t distribution to get the 95% confidence.
Best Answer
As the comments outline, you can't simply look for overlapping CI's, because it can be misleading. The better way, as you will soon learn in your classes, is to conduct a statistical hypothesis test:
You make a null hypothesis, $H_0$, which in this case would be the mean birth weight of males and females is the same, and you calculate the probability, if $H_0$ were true, of the means of your two samples not being closer than they actually are. That is the mythical p-value, which, if small enough, allows you to more or less confidently reject the null hypothesis (and get your paper published).
Notice that you can never prove $H_0$, only fail to disprove it, which is not the same. In your case, you cannot say that males and females have the same mean birth weight, only that there is not enough evidence to say they are different...
For your case, you would probably use a Student's t test, for two independent samples.