Solved – Is it OK to use gls (Generalized Least Square) to analyze repeated data? What about the wrong degrees of freedom

generalized-least-squaresmixed modelpaired-datarepeated measures

There are mainly 3 commonly used ways of analysing repeated observations via model: linear model via GLS estimation, generalized linear model via GEE estimation and mixed models (G)LMM.

Let's forget, for a second, that LMM are conditional and GLS/GEE are marginal ones, let's focus on the general linear model only, when they are equivalent.

I noticed, that people in the biosciences use a lot so called MMRM – mixed effect model for repeated measures. This is not, actually, a "true" mixed model, the name is confusing. Instead it's something that is modelled by SAS mixed-model procedure with the REPEAT part specified and without the RANDOM part (no random effects). I noticed also, that it's often pointed that the corresponding analysis in R is the GLS – nlme::gls()

When I tried to mimic the simplest paired t test, it turned out, that the mixed model correctly handled the degrees of freedom, "understanding" that the same subject was examined multiple times. At the same time, the gls() procedure took… all the observations into consideration, which is called "fake replication". I had to switch to analysis pairs of data to halve the DFs.

When I started analysing data with more than 2 time points, the difference between mixed models (which correctly reported the DF, "guessing" each subject is analysed multiple times) and the gls() were only bigger.

GLS still used all the DFs, as if it was just ordinary linear model, taking all the data into the account, only allowing for different variances at each time point (relaxing the homoscedasticity assumption). Well, that's what GLS does.

But then – how can we use the GLS to analyse repeated observations? This model totally ignores the fact the data come from the same subjects, increasing the DF and thus affecting the p-values.

Could anyone tell me how is that possible and justified, to use the LMM model with, say, random intercepts (which only partially mimics the compound symmetry), where the DFs are correctly reported and GLS (with compound symmetry for example), where the DFs are… twice (or three, … four) times larger than in GLMM to analyse repeated data?

If we clearly know, that GLS cannot replicate even the simples case, the paired t test (without switching to change scores), but LMM can, how can the GLS be called a suitabble tool to handle repeated data?

Linked topics I started:
Is there a way to force nlme::gls to use the same degrees of freedom as the nlme::lme or lme4::lmer?

Is there any way to get correct degrees of freedom in gls, matching those of paired t-test?

Best Answer

The degrees of freedom for a statistical test represents the number of observations corrected for the number of parameter values that have been estimated from the data. From that perspective, you shouldn't expect the degrees of freedom to be the same for a LME model and a corresponding GLS model. Furthermore, the issue of what the degrees of freedom should be for a LME model is far from agreed upon, so you perhaps shouldn't be taking too much solace in the agreement between a LME model and a corresponding paired t-test.

With your example data on this page, your paired t-test has effectively cut down the number of observations from 16 to 8, and with 1 df set aside for the mean difference you have 7 df left for evaluating the significance of that difference from the null value of 0.

Yes, if you fit a LME model with the lme function in the nlme package you will also get 7 df. But you will not get a df value from the lmer function in the newer lme4 package. See near the end of this answer for examples very closely related to yours. That's because of issues discussed here. The correct number of df to associate with a fixed effect in a LME model is an issue of some dispute.

A GLS model, as you note, uses the within-subject structure only to define the form of a covariance matrix, which is taken to be known. Thereafter the analysis proceeds similarly to a linear regression, and as the Wikipedia page notes the GLS model can be thought of as a standard linear regression on linearly transformed observations. So your GLS model starts with 16 observations, takes one away for each of the intercept and the slope, and has 14 df left.

Is that correct? You certainly can make an argument that it's the number of independent observations rather than the total number of observations that should matter for calculating the df. Some aspect of the intra-subject correlations is captured in the form of the covariance matrix assumed in the GLS. After that is taken into account, just how much are the repeated observations on the same individuals dependent versus independent? I think we are back to some of the same problems that arise with defining the df for a LME model.

Note that the F-test statistic value reported by the GLS model in your example is, correctly, the square of the corresponding t-statistic value. Thus if you want to use the GLS structure and think that the df value it reports represents "fake replication," you could simply use the statistic reported by GLS and adjust the number of degrees of freedom appropriately when you do the significance testing. I am not sufficiently familiar with this to know how much to adjust the df, however.

Your choice between LME and GLS modeling should be based on what you understand about the structure of your data and your primary interest in modeling. As Pinheiro and Bates put it on pages 254-255 of Mixed-Effects Models in S and S-PLUS after comparing LME and GLS models with the same fixed-effect structure:

A mixed-effects model has a hierarchical structure which, in many applications, provides a more intuitive way of accounting for within-group dependency than the direct modeling of the marginal variance–covariance structure of the response in the gls approach ... The gls model focuses on marginal inference and is more appealing when a hierarchical structure for the data is not believed to be present, or is not relevant in the analysis, and one is more interested in parameters associated with the error variance–covariance structure, as in time-series analysis and spatial statistics.

Related Solutions

Solved – Fitting a generalized least squares model with correlated data; use ML or REML

Your intuition is correct, the same principles apply. I looked in Pinheiro/Bates section 5.4, where gls is introduced, but it doesn't say so explicitly, so you'll just have to trust me, I guess. :)

In Chapter 2 they go through the theory of REML and ML and you'll notice that none of the theory depends on there being any random effects, and that actually, you could write any random effect model using just correlation structure instead and fit with gls, though for complex random effects it would be quite complex. The simplest example is that a random intercept model is equivalent to a compound symmetry model.

Solved – Paired t-test as a special case of linear mixed-effect modeling

The equivalence of the models can be observed by calculating the correlation between two observations from the same individual, as follows:

As in your notation, let $Y_{ij} = \mu + \alpha_i + \beta_j + \epsilon_{ij}$, where $\beta_j \sim N(0, \sigma_p^2)$ and $\epsilon_{ij} \sim N(0, \sigma^2)$. Then $Cov(y_{ik}, y_{jk}) = Cov(\mu + \alpha_i + \beta_k + \epsilon_{ik}, \mu + \alpha_j + \beta_k + \epsilon_{jk}) = Cov(\beta_k, \beta_k) = \sigma_p^2$, because all other terms are independent or fixed, and $Var(y_{ik}) = Var(y_{jk}) = \sigma_p^2 + \sigma^2$, so the correlation is $\sigma_p^2/(\sigma_p^2 + \sigma^2)$.

Note that the models however are not quite equivalent as the random effect model forces the correlation to be positive. The CS model and the t-test/anova model do not.

EDIT: There are two other differences as well. First, the CS and random effect models assume normality for the random effect, but the t-test/anova model does not. Secondly, the CS and random effect models are fit using maximum likelihood, while the anova is fit using mean squares; when everything is balanced they will agree, but not necessarily in more complex situations. Finally, I'd be wary of using F/df/p values from the various fits as measures of how much the models agree; see Doug Bates's famous screed on df's for more details. (END EDIT)

The problem with your R code is that you're not specifying the correlation structure properly. You need to use gls with the corCompSymm correlation structure.

Generate data so that there is a subject effect:

set.seed(5)
x <- rnorm(10)
x1<-x+rnorm(10)
x2<-x+1 + rnorm(10)
myDat <- data.frame(c(x1,x2), c(rep("x1", 10), rep("x2", 10)), 
                    rep(paste("S", seq(1,10), sep=""), 2))
names(myDat) <- c("y", "x", "subj")

Then here's how you'd fit the random effects and the compound symmetry models.

library(nlme)
fm1 <- lme(y ~ x, random=~1|subj, data=myDat)
fm2 <- gls(y ~ x, correlation=corCompSymm(form=~1|subj), data=myDat)

The standard errors from the random effects model are:

m1.varp <- 0.5453527^2
m1.vare <- 1.084408^2

And the correlation and residual variance from the CS model is:

m2.rho <- 0.2018595
m2.var <- 1.213816^2

And they're equal to what is expected:

> m1.varp/(m1.varp+m1.vare)
[1] 0.2018594
> sqrt(m1.varp + m1.vare)
[1] 1.213816

Other correlation structures are usually not fit with random effects but simply by specifying the desired structure; one common exception is the AR(1) + random effect model, which has a random effect and AR(1) correlation between observations on the same random effect.

EDIT2: When I fit the three options, I get exactly the same results except that gls doesn't try to guess the df for the term of interest.

> summary(fm1)
...
Fixed effects: y ~ x 
                 Value Std.Error DF   t-value p-value
(Intercept) -0.5611156 0.3838423  9 -1.461839  0.1778
xx2          2.0772757 0.4849618  9  4.283380  0.0020

> summary(fm2)
...
                 Value Std.Error   t-value p-value
(Intercept) -0.5611156 0.3838423 -1.461839  0.1610
xx2          2.0772757 0.4849618  4.283380  0.0004

> m1 <- lm(y~ x + subj, data=myDat)
> summary(m1)
...
            Estimate Std. Error t value Pr(>|t|)   
(Intercept)  -0.3154     0.8042  -0.392  0.70403   
xx2           2.0773     0.4850   4.283  0.00204 **

(The intercept is different here because with the default coding, it's not the mean of all subjects but instead the mean of the first subject.)

It's also of interest to note that the newer lme4 package gives the same results but doesn't even try to compute a p-value.

> mm1 <- lmer(y ~ x + (1|subj), data=myDat)
> summary(mm1)
...
            Estimate Std. Error t value
(Intercept)  -0.5611     0.3838  -1.462
xx2           2.0773     0.4850   4.283

Best Answer

Related Solutions

Solved – Fitting a generalized least squares model with correlated data; use ML or REML

Solved – Paired t-test as a special case of linear mixed-effect modeling

Related Question