Solved – Is it OK to use gls (Generalized Least Square) to analyze repeated data? What about the wrong degrees of freedom

generalized-least-squaresmixed modelpaired-datarepeated measures

There are mainly 3 commonly used ways of analysing repeated observations via model: linear model via GLS estimation, generalized linear model via GEE estimation and mixed models (G)LMM.

Let's forget, for a second, that LMM are conditional and GLS/GEE are marginal ones, let's focus on the general linear model only, when they are equivalent.

I noticed, that people in the biosciences use a lot so called MMRM – mixed effect model for repeated measures. This is not, actually, a "true" mixed model, the name is confusing. Instead it's something that is modelled by SAS mixed-model procedure with the REPEAT part specified and without the RANDOM part (no random effects). I noticed also, that it's often pointed that the corresponding analysis in R is the GLS – nlme::gls()

When I tried to mimic the simplest paired t test, it turned out, that the mixed model correctly handled the degrees of freedom, "understanding" that the same subject was examined multiple times. At the same time, the gls() procedure took… all the observations into consideration, which is called "fake replication". I had to switch to analysis pairs of data to halve the DFs.

When I started analysing data with more than 2 time points, the difference between mixed models (which correctly reported the DF, "guessing" each subject is analysed multiple times) and the gls() were only bigger.

GLS still used all the DFs, as if it was just ordinary linear model, taking all the data into the account, only allowing for different variances at each time point (relaxing the homoscedasticity assumption). Well, that's what GLS does.

But then – how can we use the GLS to analyse repeated observations? This model totally ignores the fact the data come from the same subjects, increasing the DF and thus affecting the p-values.

Could anyone tell me how is that possible and justified, to use the LMM model with, say, random intercepts (which only partially mimics the compound symmetry), where the DFs are correctly reported and GLS (with compound symmetry for example), where the DFs are… twice (or three, … four) times larger than in GLMM to analyse repeated data?

If we clearly know, that GLS cannot replicate even the simples case, the paired t test (without switching to change scores), but LMM can, how can the GLS be called a suitabble tool to handle repeated data?

Linked topics I started:
Is there a way to force nlme::gls to use the same degrees of freedom as the nlme::lme or lme4::lmer?

Is there any way to get correct degrees of freedom in gls, matching those of paired t-test?

Best Answer

The degrees of freedom for a statistical test represents the number of observations corrected for the number of parameter values that have been estimated from the data. From that perspective, you shouldn't expect the degrees of freedom to be the same for a LME model and a corresponding GLS model. Furthermore, the issue of what the degrees of freedom should be for a LME model is far from agreed upon, so you perhaps shouldn't be taking too much solace in the agreement between a LME model and a corresponding paired t-test.

With your example data on this page, your paired t-test has effectively cut down the number of observations from 16 to 8, and with 1 df set aside for the mean difference you have 7 df left for evaluating the significance of that difference from the null value of 0.

Yes, if you fit a LME model with the lme function in the nlme package you will also get 7 df. But you will not get a df value from the lmer function in the newer lme4 package. See near the end of this answer for examples very closely related to yours. That's because of issues discussed here. The correct number of df to associate with a fixed effect in a LME model is an issue of some dispute.

A GLS model, as you note, uses the within-subject structure only to define the form of a covariance matrix, which is taken to be known. Thereafter the analysis proceeds similarly to a linear regression, and as the Wikipedia page notes the GLS model can be thought of as a standard linear regression on linearly transformed observations. So your GLS model starts with 16 observations, takes one away for each of the intercept and the slope, and has 14 df left.

Is that correct? You certainly can make an argument that it's the number of independent observations rather than the total number of observations that should matter for calculating the df. Some aspect of the intra-subject correlations is captured in the form of the covariance matrix assumed in the GLS. After that is taken into account, just how much are the repeated observations on the same individuals dependent versus independent? I think we are back to some of the same problems that arise with defining the df for a LME model.

Note that the F-test statistic value reported by the GLS model in your example is, correctly, the square of the corresponding t-statistic value. Thus if you want to use the GLS structure and think that the df value it reports represents "fake replication," you could simply use the statistic reported by GLS and adjust the number of degrees of freedom appropriately when you do the significance testing. I am not sufficiently familiar with this to know how much to adjust the df, however.

Your choice between LME and GLS modeling should be based on what you understand about the structure of your data and your primary interest in modeling. As Pinheiro and Bates put it on pages 254-255 of Mixed-Effects Models in S and S-PLUS after comparing LME and GLS models with the same fixed-effect structure:

A mixed-effects model has a hierarchical structure which, in many applications, provides a more intuitive way of accounting for within-group dependency than the direct modeling of the marginal variance–covariance structure of the response in the gls approach ... The gls model focuses on marginal inference and is more appealing when a hierarchical structure for the data is not believed to be present, or is not relevant in the analysis, and one is more interested in parameters associated with the error variance–covariance structure, as in time-series analysis and spatial statistics.

Related Question