Solved – post-hoc test on linear mixed effect model

interactionmixed modelpost-hoc

I am fitting a linear mixed effect model to study the interaction of two independent variables, a covariate time and a factor m (levels "R" and "V"), on the outcome variable var. Data are grouped by the variable id, and I consider the intercept for each group as a random variable. In particular, I am interested in whether var assumes different values depending on the levels of m at different time points.

Here is how I am fitting the model:

> options( contrasts = c( factor = "contr.treatment",ordered = "contr.poly"))
> var.lm <- lme(var ~ m*time, random = ~1|id, data=dat, na.action=na.omit, method = "ML")
> summary(var.lm)

Linear mixed-effects model fit by maximum likelihood
Data: dat 
    AIC      BIC   logLik
    2091.779 2117.895 -1039.89

Random effects:
Formula: ~1 | id
         (Intercept) Residual
StdDev: 4.534834e-05 1.480997

Fixed effects: var ~ m * time 
                 Value  Std.Error  DF   t-value p-value
(Intercept)  2.5819962 0.10587243 570 24.387806  0.0000
mV          -0.8477854 0.14972621 570 -5.662238  0.0000
time         0.0103348 0.00490927 570  2.105150  0.0357
mV:time     -0.0161672 0.00694276 570 -2.328636  0.0202
 Correlation: 
        (Intr) mV     time  
mV      -0.707              
time    -0.560  0.396       
mV:time  0.396 -0.560 -0.707

Standardized Within-Group Residuals:
       Min         Q1        Med         Q3        Max 
-1.6617004 -0.6537707 -0.2195518  0.4517715  3.9370134 

Number of Observations: 574
Number of Groups: 1 

So there seems to be a significant interaction between m and time!

The data are normalized with respect to their mean at time=-1 for each level of m, such that the expected value $$E(var|time=-1,m=R) = E(var|time=-1,m=V) = 1$$

This is confirmed as follows:

> mean(dat[dat$time==-1 & dat$m=="R",]$var, na.rm=TRUE)
[1] 1

> mean(dat[dat$time==-1 & dat$m=="V",]$var, na.rm=TRUE)
[1] 1

> boxplot(var~time*m, data=dat[dat$time %in% c(-1),])

enter image description here

I therefore expect, by construction, that the post-hoc analysis to test the null hypothesis E(var|time=-1,m=R) – E(var|time=-1,m=V) = 0 is not statistical significant. However, it turns out to be, and I do not understand why. Most probably I am doing something wrong.

Since I am using dummy coding for the factor m with the level "R" as the reference value (meaning that mV=1 if m==V), the null hypothesis should translate into the formula mV-mV:time=0. Therefore, I perform this tests as follows:

> ph_conditional <- c("mV - mV:time  = 0");  
> var.ph <- glht(var.lm, linfct = ph_conditional);
> summary(var.ph)

Simultaneous Tests for General Linear Hypotheses

Fit: lme.formula(fixed = var ~ m * time, data = dat, random = ~1 | 
id, method = "ML", na.action = na.omit)

Linear Hypotheses:
                  Estimate Std. Error z value Pr(>|z|)    
mV - mV:time == 0  -0.8316     0.1532  -5.429 5.67e-08 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
(Adjusted p values reported -- single-step method)

The null-hypothesis is rejected. There must be a problem and I cannot find it.

To make sure that my expectations are correct, I reduced the dataset to only the data at time=-1, and fit a model with no interactions. I expect the factor mV to be non statistically significant, meaning that the values of var at time=-1 and m=R are not statistically significantly different than those at time=-1 and m=V.

> dat_small <- dat[dat$time==-1,]
> options( contrasts = c( factor = "contr.treatment",ordered = "contr.poly"))
> var_small.lm <- lme(var ~ m, random = ~1|id, data=dat_small, na.action=na.omit, method = "ML")
> summary(var_small.lm)

Linear mixed-effects model fit by maximum likelihood
  Data: dat_small 
      AIC      BIC    logLik
  306.861 319.2114 -149.4305

Random effects:
 Formula: ~1 | id
         (Intercept)  Residual
StdDev: 1.242641e-05 0.6086403

Fixed effects: var ~ m 
            Value  Std.Error  DF t-value p-value
(Intercept)     1 0.06804805 160 14.6955       0
mV              0 0.09623447 160  0.0000       1
 Correlation: 
   (Intr)
mV -0.707

Standardized Within-Group Residuals:
       Min         Q1        Med         Q3        Max 
-1.4611461 -0.7789445 -0.2552357  0.6026049  4.6896472 

Number of Observations: 162
Number of Groups: 1 

As expected, mV is not significant. Therefore, I am doing some mistakes during the post-hoc analysis of the model with interaction. Any help is very much appreciated.

You can find the dataset as well as the code in this link. This is the dataset from which I have computed the results in this post, but it is not the complete dataset of my project. That's why I am using a random effect on the variable id even if in this reduced dataset there is only one group. I get very similar results by using the complete dataset. Thanks a lot!

Best Answer

Just the error message doesn't tell the whole story:

> lsmeans(var.lm, pairwise ~ m | time, at = list(time = -1))
$lsmeans
time = -1:
 m   lsmean        SE df lower.CL upper.CL
 R 2.571661 0.1086969  0      NaN      NaN
 V 1.740043 0.1086969  0      NaN      NaN

Confidence level used: 0.95 

$contrasts
 Show Traceback

 Rerun with Debug
 Error in rep("", nrow(m)) : invalid 'times' argument In addition: Warning message:
In qt((1 - level)/adiv, df) : NaNs produced

From the output of the means, you do have results, and among them is the report that the degrees of freedom is zero. That make it impossible to do $t$ tests or confidence intervals. Looking further, I find:

> summary(dat$id)
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
      1       1       1       1       1       1 

You have 0 d.f. because you have only one subject. I have no idea why lme didn't complain about this, but that's the problem.

If you really have only one subject, then you can fit a model to the data on that one subject, but your inferences apply only to that one subject:

> var.notmixed = lm(var ~ m*time, data = dat)
> lsmeans(var.notmixed, pairwise ~ m*time, at = list(time = -1))
$lsmeans
 m time   lsmean        SE  df lower.CL upper.CL
 R   -1 2.571661 0.1086969 570 2.358166 2.785157
 V   -1 1.740043 0.1086969 570 1.526548 1.953539

Confidence level used: 0.95 

$contrasts
 contrast     estimate        SE  df t.ratio p.value
 R,-1 - V,-1 0.8316182 0.1537207 570    5.41  <.0001

It may be true that there is no difference in means of the data at time -1, but your model is fitting two straight lines with different slopes, and the times vary from -1 to +45. The lsmeans are based on predictions from the model at time -1. I emphasized model to make it clear that they are not predictions from the data. Here are the fitted lines:

> lsmip(var.notmixed, m ~ time, at = list(time=c(-1,45)))

enter image description here

Related Question