I aim to estimate the annual proportion of patients (% of patients) that are smokers in a population whose age and sex must be taken into account. In other words, I want to calculate the adjusted prevalence (%) of smoking each year. I have repeated measurements on the same individuals and want to model the individual as a random effect, which is why I use the lme4 package, more precisely the glmer function. The variable of main interest is "year" (period 1996 to 2014), which I need to model as a fixed effect.
Aim: Obtain adjusted proportions (%) of smokers each year.
Suppose the data set is named "df" and the year variable is converted to a factor.
I tried this code (generated with a slightly different data set than the attached one) to fit the model:
> smoke <- glmer(smoker ~ biomarker + year + sex + age + (1 | id), data
> = df, family = binomial, nAGQ = 1)
Fixed effects:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -6.201632 0.231582 -26.779 < 2e-16 ***
biomarker -0.015364 0.008299 -1.851 0.06413 .
yuar1997 0.648292 0.212400 3.052 0.00227 **
yuar1998 -0.586996 0.227217 -2.583 0.00978 **
yuar1999 -1.194309 0.216907 -5.506 3.67e-08 ***
yuar2000 -0.999889 0.217536 -4.596 4.30e-06 ***
yuar2001 -0.884453 0.203351 -4.349 1.37e-05 ***
yuar2002 -0.777464 0.199151 -3.904 9.47e-05 ***
yuar2003 -0.961869 0.194723 -4.940 7.83e-07 ***
yuar2004 -1.755470 0.197157 -8.904 < 2e-16 ***
yuar2005 -1.207833 0.189753 -6.365 1.95e-10 ***
yuar2006 -1.072532 0.187504 -5.720 1.07e-08 ***
yuar2007 -1.494477 0.189467 -7.888 3.08e-15 ***
yuar2008 -2.441916 0.191069 -12.780 < 2e-16 ***
yuar2009 -1.881562 0.187321 -10.045 < 2e-16 ***
yuar2010 -2.254924 0.187254 -12.042 < 2e-16 ***
yuar2011 -1.634935 0.184929 -8.841 < 2e-16 ***
yuar2012 -2.405588 0.187349 -12.840 < 2e-16 ***
yuar2013 -2.119775 0.186729 -11.352 < 2e-16 ***
yuar2014 -2.241768 0.210259 -10.662 < 2e-16 ***
sex -0.071377 0.115975 -0.615 0.53826
age -0.012897 0.008011 -1.610 0.10742
Using the predict function to obtain probability of being a smoker in 2005:
predict(smoke, data.frame(age=mean(df$age), year="2005", sex=mean(df$sex), biomarker=mean(df$biomarker, na.rm=T)), type="response", re.form = NA)
I obtain much too low probabilities of being a smoker a particular year:
0.0002233488
The same is true when using the lsmeans and effects package. Figures should be around 5–15% smokers.
In short, in the data set I'm aiming to obtain the proportions of smokers during different years, adjusted for differences in age, sex and the biomarker while accounting of repeated measurements.
I'd be extremely grateful for a solution to these problems.
Best Answer
predict() in lme4 does not work well unless the grouping factor specification is "realistic". If we use samples from the observed data, we get reasonable predictions. I think this is a bug in predict.merMod()
This is lme4 1.1-7
Compare that with what you get if you set id to a new value, not present in df when my.fit was evaluated.
If you want confidence intervals around the predicted means, I can recommend the
boot
packageUsing R = 2 in the call to
boot
, I got the following results: