I'm examining the effect of income (categorized into quintiles) on a response variable during different years (from 2003 to 2014). I adjust for some other covariates and have repeated measurements on the same individual.
The problem: To obtain the effect of income on the response variable each year, I include an interaction term between year and income quintile. This gives me implausible results, since those who are rich tend to have higher adjusted values, which is unlikely. So I redid the analysis, fitting one separate regression model each year, and that gave me plausible results; i.e the rich had lower adjusted values.
Shouldn't the two methods yield somewhat similar results?
My calculations:
> # START
>
> # Converting the year variable into a factor
> data$year <- as.factor(data$year)
>
> # Fitting a model with random effects for person
> require(lme4)
> fit <- lmer(response ~ income*year + sex + age + education + biomarker + (1 | id), data=data)
> # Using lsmeans to predict means (95% confidence intervals)
> require(lsmeans)
> results1 <- lsmeans(fit, ~ income*year)
> summary(results1)
income year lsmean SE df lower.CL upper.CL
Quintile 1 2003 64.95472 0.2826723 190216.2 64.40069 65.50875
Quintile 2 2003 65.49504 0.2716893 189962.8 64.96254 66.02755
Quintile 3 2003 65.39961 0.2713204 189648.3 64.86782 65.93139
Quintile 4 2003 65.51872 0.2715941 189734.3 64.98640 66.05103
Quintile 5 2003 65.47502 0.2744592 190425.5 64.93709 66.01296
Quintile 1 2004 64.03440 0.2541402 191982.7 63.53630 64.53251
Quintile 2 2004 65.04364 0.2472738 191720.0 64.55899 65.52829
Quintile 3 2004 64.98069 0.2425866 191644.4 64.50523 65.45616
Quintile 4 2004 65.14219 0.2459067 191477.1 64.66022 65.62416
Quintile 5 2004 65.25515 0.2496510 191947.6 64.76584 65.74446
Quintile 1 2005 63.62453 0.2345767 192988.5 63.16476 64.08429
Quintile 2 2005 64.18179 0.2294655 192853.2 63.73205 64.63154
Quintile 3 2005 64.00250 0.2308264 192458.4 63.55008 64.45491
Quintile 4 2005 63.79620 0.2276547 192775.7 63.35000 64.24240
Quintile 5 2005 64.70116 0.2344171 193212.2 64.24171 65.16062
Quintile 1 2006 64.17463 0.2228096 193482.8 63.73793 64.61134
Quintile 2 2006 64.37025 0.2158588 193425.3 63.94717 64.79333
Quintile 3 2006 64.30762 0.2141169 193366.8 63.88795 64.72728
Quintile 4 2006 64.38315 0.2168376 193329.6 63.95815 64.80814
Quintile 5 2006 64.22019 0.2198772 193526.5 63.78923 64.65114
Quintile 1 2007 63.84188 0.2185725 193602.5 63.41349 64.27028
Quintile 2 2007 63.92346 0.2104634 193527.6 63.51096 64.33597
Quintile 3 2007 63.67954 0.2094212 193511.7 63.26908 64.09000
Quintile 4 2007 64.08718 0.2103912 193478.5 63.67482 64.49955
Quintile 5 2007 64.03627 0.2113704 193649.9 63.62199 64.45055
Quintile 1 2008 64.65587 0.2043770 193372.1 64.25529 65.05644
Quintile 2 2008 64.30141 0.1957655 193499.2 63.91772 64.68511
Quintile 3 2008 65.04454 0.1955247 193589.4 64.66131 65.42776
Quintile 4 2008 64.94073 0.1947715 193488.8 64.55898 65.32247
Quintile 5 2008 65.00096 0.1979129 193119.2 64.61305 65.38886
Quintile 1 2009 64.74611 0.1941592 191979.5 64.36556 65.12666
Quintile 2 2009 64.68663 0.1872505 192801.9 64.31962 65.05363
Quintile 3 2009 64.89048 0.1858611 192919.0 64.52620 65.25476
Quintile 4 2009 65.19469 0.1848586 192734.9 64.83238 65.55701
Quintile 5 2009 65.18344 0.1896058 192025.8 64.81182 65.55506
Quintile 1 2010 64.99407 0.1863001 188874.1 64.62893 65.35922
Quintile 2 2010 65.14159 0.1758624 190272.3 64.79691 65.48628
Quintile 3 2010 65.21003 0.1740553 190655.7 64.86889 65.55118
Quintile 4 2010 65.65492 0.1731845 190157.8 65.31548 65.99435
Quintile 5 2010 65.42802 0.1774762 189156.3 65.08017 65.77587
Quintile 1 2011 65.77711 0.1807193 179035.8 65.42290 66.13131
Quintile 2 2011 65.81373 0.1719076 186787.1 65.47679 66.15066
Quintile 3 2011 66.25649 0.1692650 187174.9 65.92473 66.58824
Quintile 4 2011 66.24046 0.1672252 185968.9 65.91271 66.56822
Quintile 5 2011 66.31895 0.1717513 184631.5 65.98232 66.65558
Quintile 1 2012 66.45412 0.1815487 178604.9 66.09829 66.80995
Quintile 2 2012 66.39743 0.1691640 185466.1 66.06587 66.72899
Quintile 3 2012 66.61760 0.1674829 185934.4 66.28934 66.94586
Quintile 4 2012 66.69909 0.1672601 185774.9 66.37126 67.02692
Quintile 5 2012 66.74211 0.1697808 183081.3 66.40934 67.07487
Quintile 1 2013 66.20088 0.1804738 177511.5 65.84716 66.55461
Quintile 2 2013 66.05285 0.1710873 185354.9 65.71752 66.38817
Quintile 3 2013 65.57061 0.1667456 185103.9 65.24379 65.89742
Quintile 4 2013 65.96563 0.1669031 184335.3 65.63851 66.29276
Quintile 5 2013 66.19121 0.1723801 183746.1 65.85335 66.52907
Quintile 1 2014 65.37137 0.3060358 191891.8 64.77155 65.97120
Quintile 2 2014 66.37503 0.2882805 188873.8 65.81001 66.94006
Quintile 3 2014 65.49851 0.2876551 188265.5 64.93471 66.06231
Quintile 4 2014 66.08503 0.2867954 188729.4 65.52291 66.64714
Quintile 5 2014 65.98435 0.2901786 189169.5 65.41561 66.55309
Results are averaged over the levels of: sex, education
Confidence level used: 0.95
> # As you can see above, the richest (quintile 5) has higher values of a harmful biomarker, this is not likelybiomarker, this is not likely
>
> # Redo the analysis by stratification, i.e one regression each year (I'll give 4 examples, years 2005, 2008, 2010 and 2014)
> fit2005 <- lmer(response ~ income + sex + age + education + biomarker + (1 | id), data=data[data$year=="2005",])
> fit2008 <- lmer(response ~ income + sex + age + education + biomarker + (1 | id), data=data[data$year=="2008",])
> fit2010 <- lmer(response ~ income + sex + age + education + biomarker + (1 | id), data=data[data$year=="2010",])
> fit2014 <- lmer(response ~ income + sex + age + education + biomarker + (1 | id), data=data[data$year=="2014",])
>
> lsmeans(fit2005, "income")
income lsmean SE df lower.CL upper.CL
Quintile 1 64.57677 0.3644627 7104.87 63.86232 65.29123
Quintile 2 63.65426 0.3715929 7114.04 62.92582 64.38269
Quintile 3 63.97948 0.3773103 7119.28 63.23984 64.71912
Quintile 4 63.46368 0.3727073 7117.62 62.73306 64.19429
Quintile 5 63.57556 0.3853513 7117.67 62.82016 64.33096
Results are averaged over the levels of: sex, education
Confidence level used: 0.95
> lsmeans(fit2008, "income")
income lsmean SE df lower.CL upper.CL
Quintile 1 64.27986 0.3268165 9880.81 63.63924 64.92049
Quintile 2 65.07827 0.3288624 9866.53 64.43363 65.72291
Quintile 3 64.98781 0.3265577 9859.02 64.34769 65.62793
Quintile 4 64.89630 0.3305190 9868.49 64.24842 65.54419
Quintile 5 63.66509 0.3428045 9898.35 62.99313 64.33706
Results are averaged over the levels of: sex, education
Confidence level used: 0.95
> lsmeans(fit2010, "income")
income lsmean SE df lower.CL upper.CL
Quintile 1 65.39530 0.2961521 11897.22 64.81480 65.97581
Quintile 2 65.61791 0.2911321 11892.26 65.04724 66.18858
Quintile 3 65.48423 0.2947050 11892.60 64.90656 66.06190
Quintile 4 65.14303 0.2914349 11925.62 64.57177 65.71429
Quintile 5 63.89145 0.3030998 11935.15 63.29733 64.48558
Results are averaged over the levels of: sex, education
Confidence level used: 0.95
> lsmeans(fit2014, "income")
income lsmean SE df lower.CL upper.CL
Quintile 1 67.05015 0.4888236 4523.42 66.09182 68.00849
Quintile 2 65.41100 0.4801968 4523.32 64.46958 66.35242
Quintile 3 65.35658 0.4740396 4525.63 64.42723 66.28592
Quintile 4 65.03556 0.4793119 4526.56 64.09587 65.97524
Quintile 5 64.94690 0.5001898 4529.88 63.96628 65.92751
Results are averaged over the levels of: sex, education
Confidence level used: 0.95
I must have misunderstood something (important) here…
Any advice?
Best Answer
I didn't have the
id
variable at the time of preparing this answer so I do not reproduce your results. But the explanation can be made usinglm
fits on these data.First, your model without the
id
grouping. I'll show the model, LS means, and comparisons with the lowest income quintile for just 2005:(Note: I edited-out some annotations in the output.) Now here are the same results, using a model fitted to only the 2005 data:
Now, consider changing the overall model so that income interacts with every other predictor:
The LS means are slightly different from the ones for
fit.lm2005
, but the comparisons are identical. The only reason the LS means are shifted is because different reference values are used forage
andbiomarker
. In the code below, I show that we do get the same results (except fro sliht errors due to rounding) fromfit.lmint
if we use the same reference levels:Take-home message: When you fit models separately by some factor, the fits will differ from those for a combined model, unless the combined model includes all interactions with the factor you used for the separate fits.
Also, please note that
lsmeans
only summarizes model results. If you use a spec like~A*B
in thelsmeans
call, that does not add anA:B
interaction to the model, if that interaction was not present.Additional note added later:
With a mixed model, if the
id
variable is crossed withyear
, then you probably will not get identical results by just including the fixed-effects interactions. this is because you'd be estimating differentid
effects in each year too. You'd have to make the random part(1|id/year)
.