Solved – Examining trends with interactions and with stratification – obtaining discordant results

generalized linear modelmixed modelrregression

I'm examining the effect of income (categorized into quintiles) on a response variable during different years (from 2003 to 2014). I adjust for some other covariates and have repeated measurements on the same individual.

The problem: To obtain the effect of income on the response variable each year, I include an interaction term between year and income quintile. This gives me implausible results, since those who are rich tend to have higher adjusted values, which is unlikely. So I redid the analysis, fitting one separate regression model each year, and that gave me plausible results; i.e the rich had lower adjusted values.

Shouldn't the two methods yield somewhat similar results?

My calculations:

    > # START
    > 
    > # Converting the year variable into a factor
    > data$year <- as.factor(data$year)
    > 
    > # Fitting a model with random effects for person
> require(lme4)
> fit  <- lmer(response ~ income*year + sex + age + education + biomarker + (1 | id), data=data)
> # Using lsmeans to predict means (95% confidence intervals)
> require(lsmeans)
> results1 <- lsmeans(fit, ~ income*year)
> summary(results1)
 income     year   lsmean        SE       df lower.CL upper.CL
 Quintile 1 2003 64.95472 0.2826723 190216.2 64.40069 65.50875
 Quintile 2 2003 65.49504 0.2716893 189962.8 64.96254 66.02755
 Quintile 3 2003 65.39961 0.2713204 189648.3 64.86782 65.93139
 Quintile 4 2003 65.51872 0.2715941 189734.3 64.98640 66.05103
 Quintile 5 2003 65.47502 0.2744592 190425.5 64.93709 66.01296
 Quintile 1 2004 64.03440 0.2541402 191982.7 63.53630 64.53251
 Quintile 2 2004 65.04364 0.2472738 191720.0 64.55899 65.52829
 Quintile 3 2004 64.98069 0.2425866 191644.4 64.50523 65.45616
 Quintile 4 2004 65.14219 0.2459067 191477.1 64.66022 65.62416
 Quintile 5 2004 65.25515 0.2496510 191947.6 64.76584 65.74446
 Quintile 1 2005 63.62453 0.2345767 192988.5 63.16476 64.08429
 Quintile 2 2005 64.18179 0.2294655 192853.2 63.73205 64.63154
 Quintile 3 2005 64.00250 0.2308264 192458.4 63.55008 64.45491
 Quintile 4 2005 63.79620 0.2276547 192775.7 63.35000 64.24240
 Quintile 5 2005 64.70116 0.2344171 193212.2 64.24171 65.16062
 Quintile 1 2006 64.17463 0.2228096 193482.8 63.73793 64.61134
 Quintile 2 2006 64.37025 0.2158588 193425.3 63.94717 64.79333
 Quintile 3 2006 64.30762 0.2141169 193366.8 63.88795 64.72728
 Quintile 4 2006 64.38315 0.2168376 193329.6 63.95815 64.80814
 Quintile 5 2006 64.22019 0.2198772 193526.5 63.78923 64.65114
 Quintile 1 2007 63.84188 0.2185725 193602.5 63.41349 64.27028
 Quintile 2 2007 63.92346 0.2104634 193527.6 63.51096 64.33597
 Quintile 3 2007 63.67954 0.2094212 193511.7 63.26908 64.09000
 Quintile 4 2007 64.08718 0.2103912 193478.5 63.67482 64.49955
 Quintile 5 2007 64.03627 0.2113704 193649.9 63.62199 64.45055
 Quintile 1 2008 64.65587 0.2043770 193372.1 64.25529 65.05644
 Quintile 2 2008 64.30141 0.1957655 193499.2 63.91772 64.68511
 Quintile 3 2008 65.04454 0.1955247 193589.4 64.66131 65.42776
 Quintile 4 2008 64.94073 0.1947715 193488.8 64.55898 65.32247
 Quintile 5 2008 65.00096 0.1979129 193119.2 64.61305 65.38886
 Quintile 1 2009 64.74611 0.1941592 191979.5 64.36556 65.12666
 Quintile 2 2009 64.68663 0.1872505 192801.9 64.31962 65.05363
 Quintile 3 2009 64.89048 0.1858611 192919.0 64.52620 65.25476
 Quintile 4 2009 65.19469 0.1848586 192734.9 64.83238 65.55701
 Quintile 5 2009 65.18344 0.1896058 192025.8 64.81182 65.55506
 Quintile 1 2010 64.99407 0.1863001 188874.1 64.62893 65.35922
 Quintile 2 2010 65.14159 0.1758624 190272.3 64.79691 65.48628
 Quintile 3 2010 65.21003 0.1740553 190655.7 64.86889 65.55118
 Quintile 4 2010 65.65492 0.1731845 190157.8 65.31548 65.99435
 Quintile 5 2010 65.42802 0.1774762 189156.3 65.08017 65.77587
 Quintile 1 2011 65.77711 0.1807193 179035.8 65.42290 66.13131
 Quintile 2 2011 65.81373 0.1719076 186787.1 65.47679 66.15066
 Quintile 3 2011 66.25649 0.1692650 187174.9 65.92473 66.58824
 Quintile 4 2011 66.24046 0.1672252 185968.9 65.91271 66.56822
 Quintile 5 2011 66.31895 0.1717513 184631.5 65.98232 66.65558
 Quintile 1 2012 66.45412 0.1815487 178604.9 66.09829 66.80995
 Quintile 2 2012 66.39743 0.1691640 185466.1 66.06587 66.72899
 Quintile 3 2012 66.61760 0.1674829 185934.4 66.28934 66.94586
 Quintile 4 2012 66.69909 0.1672601 185774.9 66.37126 67.02692
 Quintile 5 2012 66.74211 0.1697808 183081.3 66.40934 67.07487
 Quintile 1 2013 66.20088 0.1804738 177511.5 65.84716 66.55461
 Quintile 2 2013 66.05285 0.1710873 185354.9 65.71752 66.38817
 Quintile 3 2013 65.57061 0.1667456 185103.9 65.24379 65.89742
 Quintile 4 2013 65.96563 0.1669031 184335.3 65.63851 66.29276
 Quintile 5 2013 66.19121 0.1723801 183746.1 65.85335 66.52907
 Quintile 1 2014 65.37137 0.3060358 191891.8 64.77155 65.97120
 Quintile 2 2014 66.37503 0.2882805 188873.8 65.81001 66.94006
 Quintile 3 2014 65.49851 0.2876551 188265.5 64.93471 66.06231
 Quintile 4 2014 66.08503 0.2867954 188729.4 65.52291 66.64714
 Quintile 5 2014 65.98435 0.2901786 189169.5 65.41561 66.55309

Results are averaged over the levels of: sex, education 
Confidence level used: 0.95 
> # As you can see above, the richest (quintile 5) has higher values of a harmful biomarker, this is not likelybiomarker, this is not likely
    > 
    > # Redo the analysis by stratification, i.e one regression each year (I'll give 4 examples, years 2005, 2008, 2010 and 2014)
    > fit2005  <- lmer(response ~ income + sex + age + education + biomarker + (1 | id), data=data[data$year=="2005",])
        > fit2008  <- lmer(response ~ income + sex + age + education + biomarker + (1 | id), data=data[data$year=="2008",])
    > fit2010  <- lmer(response ~ income + sex + age + education + biomarker + (1 | id), data=data[data$year=="2010",])
        > fit2014  <- lmer(response ~ income + sex + age + education + biomarker + (1 | id), data=data[data$year=="2014",])
    > 
    > lsmeans(fit2005, "income")
     income       lsmean        SE      df lower.CL upper.CL
     Quintile 1 64.57677 0.3644627 7104.87 63.86232 65.29123
     Quintile 2 63.65426 0.3715929 7114.04 62.92582 64.38269
     Quintile 3 63.97948 0.3773103 7119.28 63.23984 64.71912
     Quintile 4 63.46368 0.3727073 7117.62 62.73306 64.19429
     Quintile 5 63.57556 0.3853513 7117.67 62.82016 64.33096

    Results are averaged over the levels of: sex, education 
    Confidence level used: 0.95 
    > lsmeans(fit2008, "income")
     income       lsmean        SE      df lower.CL upper.CL
     Quintile 1 64.27986 0.3268165 9880.81 63.63924 64.92049
     Quintile 2 65.07827 0.3288624 9866.53 64.43363 65.72291
     Quintile 3 64.98781 0.3265577 9859.02 64.34769 65.62793
     Quintile 4 64.89630 0.3305190 9868.49 64.24842 65.54419
     Quintile 5 63.66509 0.3428045 9898.35 62.99313 64.33706

    Results are averaged over the levels of: sex, education 
    Confidence level used: 0.95 
    > lsmeans(fit2010, "income")
     income       lsmean        SE       df lower.CL upper.CL
     Quintile 1 65.39530 0.2961521 11897.22 64.81480 65.97581
     Quintile 2 65.61791 0.2911321 11892.26 65.04724 66.18858
     Quintile 3 65.48423 0.2947050 11892.60 64.90656 66.06190
     Quintile 4 65.14303 0.2914349 11925.62 64.57177 65.71429
     Quintile 5 63.89145 0.3030998 11935.15 63.29733 64.48558

    Results are averaged over the levels of: sex, education 
    Confidence level used: 0.95 
    > lsmeans(fit2014, "income")
     income       lsmean        SE      df lower.CL upper.CL
     Quintile 1 67.05015 0.4888236 4523.42 66.09182 68.00849
     Quintile 2 65.41100 0.4801968 4523.32 64.46958 66.35242
     Quintile 3 65.35658 0.4740396 4525.63 64.42723 66.28592
     Quintile 4 65.03556 0.4793119 4526.56 64.09587 65.97524
     Quintile 5 64.94690 0.5001898 4529.88 63.96628 65.92751

    Results are averaged over the levels of: sex, education 
    Confidence level used: 0.95 

I must have misunderstood something (important) here…

Any advice?

Best Answer

I didn't have the id variable at the time of preparing this answer so I do not reproduce your results. But the explanation can be made using lm fits on these data.

First, your model without the id grouping. I'll show the model, LS means, and comparisons with the lowest income quintile for just 2005:

> data$year = factor(data$year)
> fit.lm  <- lm(response ~ income*year + sex + age + education + biomarker, 
    data=data)
> lsmeans(fit.lm, trt.vs.ctrl1 ~ income, at = list(year = "2005"))

$lsmeans
 income       lsmean        SE    df lower.CL upper.CL
 Quintile 1 64.20122 1.0008221 19262 62.23952 66.16292
 Quintile 2 64.45945 1.0060940 19262 62.48742 66.43148
 Quintile 3 64.37015 0.9685111 19262 62.47178 66.26851
 Quintile 4 63.88540 0.9971111 19262 61.93098 65.83983
 Quintile 5 63.36031 1.0319215 19262 61.33765 65.38297

$contrasts
 contrast                  estimate       SE    df t.ratio p.value
 Quintile 2 - Quintile 1  0.2582246 1.417633 19262   0.182  0.9937
 Quintile 3 - Quintile 1  0.1689240 1.390806 19262   0.121  0.9976
 Quintile 4 - Quintile 1 -0.3158223 1.411699 19262  -0.224  0.9898
 Quintile 5 - Quintile 1 -0.8409129 1.437786 19262  -0.585  0.9013

(Note: I edited-out some annotations in the output.) Now here are the same results, using a model fitted to only the 2005 data:

> fit.lm2005  <- lm(response ~ income + sex + age + education + biomarker, 
    data=data, subset = (year=="2005"))  
> lsmeans(fit.lm2005, trt.vs.ctrl1 ~ income)

$lsmeans
 income       lsmean        SE  df lower.CL upper.CL
 Quintile 1 64.37016 0.9884950 998 62.43039 66.30993
 Quintile 2 64.31077 1.0036776 998 62.34121 66.28033
 Quintile 3 64.35983 0.9747897 998 62.44696 66.27270
 Quintile 4 64.05203 1.0154760 998 62.05932 66.04474
 Quintile 5 63.43306 1.0753143 998 61.32292 65.54320

$contrasts
 contrast                   estimate       SE  df t.ratio p.value
 Quintile 2 - Quintile 1 -0.05939015 1.385639 998  -0.043  0.9998
 Quintile 3 - Quintile 1 -0.01033274 1.357894 998  -0.008  1.0000
 Quintile 4 - Quintile 1 -0.31813180 1.400778 998  -0.227  0.9894
 Quintile 5 - Quintile 1 -0.93710083 1.465919 998  -0.639  0.8790

Now, consider changing the overall model so that income interacts with every other predictor:

> fit.lmint  <- lm(response ~ year*(income + sex + age + education + biomarker), 
    data=data)
> lsmeans(fit.lmint, trt.vs.ctrl1 ~ income, at = list(year = "2005"))

$lsmeans
 income       lsmean       SE    df lower.CL upper.CL
 Quintile 1 64.33859 1.027770 19172 62.32408 66.35311
 Quintile 2 64.27920 1.034344 19172 62.25180 66.30661
 Quintile 3 64.32826 1.007009 19172 62.35444 66.30209
 Quintile 4 64.02046 1.047664 19172 61.96695 66.07398
 Quintile 5 63.40149 1.106308 19172 61.23303 65.56996

$contrasts
 contrast                   estimate       SE    df t.ratio p.value
 Quintile 2 - Quintile 1 -0.05939015 1.432060 19172  -0.041  0.9998
 Quintile 3 - Quintile 1 -0.01033274 1.403386 19172  -0.007  1.0000
 Quintile 4 - Quintile 1 -0.31813180 1.447706 19172  -0.220  0.9902
 Quintile 5 - Quintile 1 -0.93710083 1.515029 19172  -0.619  0.8878

The LS means are slightly different from the ones for fit.lm2005, but the comparisons are identical. The only reason the LS means are shifted is because different reference values are used for age and biomarker. In the code below, I show that we do get the same results (except fro sliht errors due to rounding) from fit.lmint if we use the same reference levels:

> ref.grid(fit.lm2005)
'ref.grid' object with variables:
    income = Quintile 1, Quintile 2, Quintile 3, Quintile 4, Quintile 5
    sex = Female, Male
    age = 41.511
    education = <10 years education, 10 to 12 years education, College/University
    biomarker = 26.133

> lsmeans(fit.lmint, ~ income, at = list(year = "2005", 
    age = 41.511, biomarker = 26.133))
NOTE: Results may be misleading due to involvement in interactions
 income       lsmean       SE    df lower.CL upper.CL
 Quintile 1 64.37009 1.021611 19172 62.36765 66.37254
 Quintile 2 64.31070 1.037303 19172 62.27750 66.34391
 Quintile 3 64.35976 1.007447 19172 62.38508 66.33445
 Quintile 4 64.05196 1.049493 19172 61.99486 66.10906
 Quintile 5 63.43299 1.111337 19172 61.25468 65.61131

Take-home message: When you fit models separately by some factor, the fits will differ from those for a combined model, unless the combined model includes all interactions with the factor you used for the separate fits.

Also, please note that lsmeans only summarizes model results. If you use a spec like ~A*B in the lsmeans call, that does not add an A:B interaction to the model, if that interaction was not present.

Additional note added later:

With a mixed model, if the id variable is crossed with year, then you probably will not get identical results by just including the fixed-effects interactions. this is because you'd be estimating different id effects in each year too. You'd have to make the random part (1|id/year).

Related Question