Solved – Same estimates but different p-values in tukey post hoc test (lmer)

multiple-comparisonsrstatistical significancetukey-hsd-test

I'm pretty new in using lmer and be confused about different p-values in Tukey post hoc tests associated with exactly the same estimates. I built a linear mixed model with monetary contributions of human subjects as response variable and their wealth and number of children as explanatory variables. The experiment was designed in a way to contribute for future generations. I don't have repeated measurements of the same individual but some individuals played within the same group. There are several subsets and additional random factors but here I only want to consider the following model where totalcontSubject means contribution of a subject over the entire game, poverty is a factor with 2 levels (rich and poor), and children is a factor with 2 levels (child or noChild). Particularly I'm interested in understanding the fixed effects part of the model.

 > summary(TC1)
Linear mixed model fit by REML ['lmerMod']
Formula: totalcontSubject ~ poverty * children + (1 | group_2)
   Data: data

REML criterion at convergence: 414.6

Scaled residuals: 
     Min       1Q   Median       3Q      Max 
-2.42955 -0.45554 -0.09361  0.45228  2.33159 

Random effects:
 Groups   Name        Variance  Std.Dev. 
 group_2  (Intercept) 8.611e-15 9.280e-08
 Residual             1.042e+02 1.021e+01
Number of obs: 58, groups:  group_2, 10

Fixed effects:
                             Estimate Std. Error t value
(Intercept)                    16.200      3.228   5.019
povertyrich                     8.600      3.953   2.175
childrennoChild                 2.800      4.565   0.613
povertyrich:childrennoChild    -4.489      5.642  -0.796

Correlation of Fixed Effects:
            (Intr) pvrty chldrC
povertyrch  -0.816              
chldrnnChld -0.707  0.577       
pvrtyrch:C   0.572 -0.701 -0.809

If I interpret fixed effects of the summary table in the right way, my intercept denotes poor people with children. The estimate also corresponds to the mean value of this combination in my data. According to my calculations the difference to rich people (shown as povertyrich) actually shows the difference of the intercept to rich people with children, even if not explicitly mentioned by povertyrich. This is the first issue I'm a bit confused. A reduced model only with fixed factor poverty is significant better by anova() but it seems data including children are used for this evaluation.

If I run a Tukey post hoc test by means of my TC1 model, I get a significant difference between rich and poor. But the estimates in the summary actually include children. Estimates of intercept and slope are the means of poor people with children and the difference to rich people with children. They don't correspond to the means of poor or rich data irrespective of parenthood.

summary(glht(TC1, linfct=mcp(povertry="Tukey")))

         Simultaneous Tests for General Linear Hypotheses

Multiple Comparisons of Means: Tukey Contrasts


Fit: lmer(formula = totalcontSubject ~ poverty * children + (1 | 
    group_2), data = data)

Linear Hypotheses:
                 Estimate Std. Error z value Pr(>|z|)  
rich - poor == 0    8.600      3.953   2.175   0.0296 *
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
(Adjusted p values reported -- single-step method)

I get even more confused when I run a Tukey post hoc test for a subset where I coded interactions in a column such as poor people with children and rich people with children. In this output I have exactly the same estimates and parameters for these categories (like in the summary shown before for rich poor people exclusively) but the p-values are different. A visual check indicates that there is a significant difference between richChild and poorChild but outputs of glht Interak shows me it is not. Also, a comparison between models anova() with fixed factor poverty vs. fixed factors poverty and children indicates that I can get rid of the variable children in my model. Before I do so, I would like to understand the outputs better. I also worry about the high value for Residual and the correlations in the summary table.

> summary(glht(TC1_2, linfct=mcp(Interak="Tukey")))

         Simultaneous Tests for General Linear Hypotheses

Multiple Comparisons of Means: Tukey Contrasts


Fit: lmer(formula = totalcontSubject ~ Interak + (1 | group_2), data = data)

Linear Hypotheses:
                               Estimate Std. Error z value Pr(>|z|)
poorNoChild - poorChild == 0      2.800      4.565   0.613    0.927
richChild - poorChild == 0        8.600      3.953   2.175    0.129
richNoChild - poorChild == 0      6.911      4.026   1.717    0.312
richChild - poorNoChild == 0      5.800      3.953   1.467    0.454
richNoChild - poorNoChild == 0    4.111      4.026   1.021    0.735
richNoChild - richChild == 0     -1.689      3.316  -0.509    0.956
(Adjusted p values reported -- single-step method)

Best Answer

I agree that the models have the same fitted values, but your two glht calls are testing different hypotheses. The first is a single contrast comparing the two poverty means, averaging over children; whereas the second is the set of all pairwise comparisons among the four poverty*children combinations.

The lsmeans package is helpful in providing some insight, because it makes it relatively easy to construct the needed contrasts. In particular, try this:

library(lsmeans)
library(multcomp)
summary(glht(TC1, linfct = lsm(pairwise ~ poverty*children)))

This will generate all six pairwise comparisons of the four poverty*children means, and should give the same results as you show for the interak comparisons from TC1_2. The $P$ values may vary slightly because they are computed by a simulation method with the multivariate $t$ distribution.

Alternatively, the lsmeans package can provide the same results without needing multcomp or glht:

pairs(lsmeans(TC1, ~ poverty * children), adjust = "mvt")
pairs(lsmeans(TC1_2, ~ interak), adjust = "mvt")