Solved – Paired t-test as a special case of linear mixed-effect modeling

lme4-nlmemixed modelrrepeated measurest-test

We know that a paired t-test is just a special case of one-way repeated-measures (or within-subject) ANOVA as well as linear mixed-effect model, which can be demonstrated with lme() function the nlme package in R as shown below.

#response data from 10 subjects under two conditions
x1<-rnorm(10)
x2<-1+rnorm(10)

# Now create a dataframe for lme
myDat <- data.frame(c(x1,x2), c(rep("x1", 10), rep("x2", 10)), rep(paste("S", seq(1,10), sep=""), 2))
names(myDat) <- c("y", "x", "subj")

When I run the following paired t-test:

t.test(x1, x2, paired = TRUE)

I got this result (you will get a different result because of the random generator):

t = -2.3056, df = 9, p-value = 0.04657

With the ANOVA approach we can get the same result:

summary(aov(y ~ x + Error(subj/x), myDat))

# the F-value below is just the square of the t-value from paired t-test:
          Df  F value Pr(>F)
x          1  5.3158  0.04657

Now I can obtain the same result in lme with the following model, assuming a positive-definite symmetrical correlation matrix for the two conditions:

summary(fm1 <- lme(y ~ x, random=list(subj=pdSymm(form=~x-1)), data=myDat))

# the 2nd row in the following agrees with the paired t-test
# (Intercept) -0.2488202 0.3142115  9 -0.7918878  0.4488
# xx2          1.3325786 0.5779727  9  2.3056084  0.0466

Or another model, assuming a compound symmetry for the correlation matrix of the two conditions:

summary(fm2 <- lme(y ~ x, random=list(subj=pdCompSymm(form=~x-1)), data=myDat))

# the 2nd row in the following agrees with the paired t-test
# (Intercept) -0.2488202 0.4023431  9 -0.618428  0.5516
# xx2          1.3325786 0.5779727  9  2.305608  0.0466

With the paired t-test and one-way repeated-measures ANOVA, I can write down the traditional cell mean model as

Yij = μ + αi + βj + εij, i = 1, 2; j = 1, ..., 10

where i indexes condition, j indexes subject, Y_ij is the response variable, μ is constant for the fixed effect for overall mean, α_i is the fixed effect for condition, β_j is the random effect for subject following N(0, σ_p²) (σ_p² is population variance), and ε_ij is residual following N(0, σ²) (σ² is within-subject variance).

I thought that the cell mean model above would not be appropriate for the lme models, but the trouble is that I can't come up with a reasonable model for the two lme() approaches with the correlation structure assumption. The reason is that the lme model seems to have more parameters for the random components than the cell mean model above offers. At least the lme model provides exactly the same F-value, degrees of freedom, and p-value as well, which gls cannot. More specifically gls gives incorrect DFs due to the fact that it does not account for the fact that each subject has two observations, leading to much inflated DFs. The lme model most likely is overparameterized in specifying the random effects, but I don't know what the model is and what the parameters are. So the issue is still unresolved for me.

Best Answer

The equivalence of the models can be observed by calculating the correlation between two observations from the same individual, as follows:

As in your notation, let $Y_{ij} = \mu + \alpha_i + \beta_j + \epsilon_{ij}$, where $\beta_j \sim N(0, \sigma_p^2)$ and $\epsilon_{ij} \sim N(0, \sigma^2)$. Then $Cov(y_{ik}, y_{jk}) = Cov(\mu + \alpha_i + \beta_k + \epsilon_{ik}, \mu + \alpha_j + \beta_k + \epsilon_{jk}) = Cov(\beta_k, \beta_k) = \sigma_p^2$, because all other terms are independent or fixed, and $Var(y_{ik}) = Var(y_{jk}) = \sigma_p^2 + \sigma^2$, so the correlation is $\sigma_p^2/(\sigma_p^2 + \sigma^2)$.

Note that the models however are not quite equivalent as the random effect model forces the correlation to be positive. The CS model and the t-test/anova model do not.

EDIT: There are two other differences as well. First, the CS and random effect models assume normality for the random effect, but the t-test/anova model does not. Secondly, the CS and random effect models are fit using maximum likelihood, while the anova is fit using mean squares; when everything is balanced they will agree, but not necessarily in more complex situations. Finally, I'd be wary of using F/df/p values from the various fits as measures of how much the models agree; see Doug Bates's famous screed on df's for more details. (END EDIT)

The problem with your R code is that you're not specifying the correlation structure properly. You need to use gls with the corCompSymm correlation structure.

Generate data so that there is a subject effect:

set.seed(5)
x <- rnorm(10)
x1<-x+rnorm(10)
x2<-x+1 + rnorm(10)
myDat <- data.frame(c(x1,x2), c(rep("x1", 10), rep("x2", 10)), 
                    rep(paste("S", seq(1,10), sep=""), 2))
names(myDat) <- c("y", "x", "subj")

Then here's how you'd fit the random effects and the compound symmetry models.

library(nlme)
fm1 <- lme(y ~ x, random=~1|subj, data=myDat)
fm2 <- gls(y ~ x, correlation=corCompSymm(form=~1|subj), data=myDat)

The standard errors from the random effects model are:

m1.varp <- 0.5453527^2
m1.vare <- 1.084408^2

And the correlation and residual variance from the CS model is:

m2.rho <- 0.2018595
m2.var <- 1.213816^2

And they're equal to what is expected:

> m1.varp/(m1.varp+m1.vare)
[1] 0.2018594
> sqrt(m1.varp + m1.vare)
[1] 1.213816

Other correlation structures are usually not fit with random effects but simply by specifying the desired structure; one common exception is the AR(1) + random effect model, which has a random effect and AR(1) correlation between observations on the same random effect.

EDIT2: When I fit the three options, I get exactly the same results except that gls doesn't try to guess the df for the term of interest.

> summary(fm1)
...
Fixed effects: y ~ x 
                 Value Std.Error DF   t-value p-value
(Intercept) -0.5611156 0.3838423  9 -1.461839  0.1778
xx2          2.0772757 0.4849618  9  4.283380  0.0020

> summary(fm2)
...
                 Value Std.Error   t-value p-value
(Intercept) -0.5611156 0.3838423 -1.461839  0.1610
xx2          2.0772757 0.4849618  4.283380  0.0004

> m1 <- lm(y~ x + subj, data=myDat)
> summary(m1)
...
            Estimate Std. Error t value Pr(>|t|)   
(Intercept)  -0.3154     0.8042  -0.392  0.70403   
xx2           2.0773     0.4850   4.283  0.00204 **

(The intercept is different here because with the default coding, it's not the mean of all subjects but instead the mean of the first subject.)

It's also of interest to note that the newer lme4 package gives the same results but doesn't even try to compute a p-value.

> mm1 <- lmer(y ~ x + (1|subj), data=myDat)
> summary(mm1)
...
            Estimate Std. Error t value
(Intercept)  -0.5611     0.3838  -1.462
xx2           2.0773     0.4850   4.283

Related Solutions

Solved – When does an unpaired test result in higher p-value than a paired test

As @whuber says in the comment above, when the measures are negatively correlated, the p-value can be lower in the unpaired test than in the paired test. Here's an example where there is a difference:
```
library(MASS)
s  <- matrix(c(1, -0.8, -0.8, 1), 2)
df <- mvrnorm(n=100, mu=c(0, 0.3), Sigma=s, empirical=TRUE)
t.test(df[,1], df[, 2], paired=FALSE)
t.test(df[,1], df[, 2], paired=TRUE)
```
The first test (unpaired) gives p=0.035, the second gives p=0.117.
Yes, this is a design issue. This book chapter discusses it: Keren, G. (2014). Between-or within-subjects design: A methodological dilemma. A Handbook for Data Analysis in the Behaviorial Sciences: Volume 1: Methodological Issues Volume 2: Statistical Issues, 257, which you can read some of on Google books.

Hmmm... I'm not sure. I'd do a simulation to find out the effect on the type I error rate. How this affects your power is a separate issue that I haven't looked into here. Slight adaptation of my previous code:

paired   <- rep(NA, 1000)
unpaired <- rep(NA, 1000)
for(i in 1:1000){
      df          <- mvrnorm(n=100, mu=c(0, 0), Sigma=s, empirical=FALSE)
      unpaired[i] <- t.test(df[,1], df[, 2], paired=FALSE)$p.value
      paired[i]   <- t.test(df[,1], df[, 2], paired=TRUE )$p.value
}

sum(paired < 0.05)
sum(unpaired < 0.05)

Result:

> sum(paired < 0.05)
[1] 46
> sum(unpaired < 0.05)
[1] 137

Well look at that. If you treat them as unpaired, your type I error rate rockets. You need to treat them as paired to get the right answer. I believe (it's a long time since I've read it) that this is one of the issues Keren talks about in that chapter. If you're going to have data that might be negative correlated (e.g. amount of soup and amount of burgers someone eats) you'll have more power with an unpaired design.

Solved – Unbalanced linear mixed effect modeling for longitudinal data with lme4

Your second model does not make sense because observations are not grouped/clustered/nested within baseline_age.

The first model does make sense because observations are grouped/nested/clustered within subject_ID (because you have repeated measures). There is no further clustering as far as I can gather from the description.

So, a good initial model would be

mod2 <- lmer(DV ~ timepoint + Group + baseline_age + (1 | subject_ID), 
         data = mydat)

The coefficient for timepoint will provide the linear growth estimate(s) (whether it is coded as a factor or numeric matters, see below), the coefficient for Group will give the treatment effect, while controlling for differential baseline ages, while the random intercept for subject_ID will allow each subject's intercept (at the study inception point) to vary. You could include interactions of the fixed effects, at this stage, if this makes sense to your research question.

Subsequent to this, you might want to include one or more of the fixed effects variables as random coefficients (slopes) if the effects of these are thought to vary between subject, such as timepoint that you mention. Note that this is not the same as your statement in the question: "where (timepoint | subject_ID) reflects the fact that timepoint varies at the individual level". Obviously timepoint varies at the individual level, because you have repeated measures. The random intercept deals with this, but it assumes that the slope (coefficient) for it is the same for each subject. If you have reason to believe that the slopes should vary between subjects then you can include one or more on the left side of the | symbol). However, if timepoint is a factor then this will result in a separate random effect for each level, which will increase the computational burden an possibly cause numerical problems (that is, assuming you have enough observations to make such a model identifiable to begin with), as well as making the model interpretation more complex.

Also note that if timepoint is a factor then you will get a seperate fixed effect estimate for each level too, which may or may not be what you want (in my experience it is not usually what you want unless the number of levels is small). So, if it isn't already, you might also want to consider coding timepoint as a numeric variable, this will then give you a single fixed main effect. This will model linear growth in your outcome/response, giving each subject their own intercept. If you also add timepoint as a random slope, then you can allow each subject to have their own slope. If you want to cater for non-linear growth then you could add a quadratic variable for timepoint (centering it first to avoid collinearity).

Best Answer

Related Solutions

Solved – When does an unpaired test result in higher p-value than a paired test

Solved – Unbalanced linear mixed effect modeling for longitudinal data with lme4

Related Question