The degrees of freedom for a statistical test represents the number of observations corrected for the number of parameter values that have been estimated from the data. From that perspective, you shouldn't expect the degrees of freedom to be the same for a LME model and a corresponding GLS model. Furthermore, the issue of what the degrees of freedom should be for a LME model is far from agreed upon, so you perhaps shouldn't be taking too much solace in the agreement between a LME model and a corresponding paired t-test.
With your example data on this page, your paired t-test has effectively cut down the number of observations from 16 to 8, and with 1 df set aside for the mean difference you have 7 df left for evaluating the significance of that difference from the null value of 0.
Yes, if you fit a LME model with the lme
function in the nlme
package you will also get 7 df. But you will not get a df value from the lmer
function in the newer lme4
package. See near the end of this answer for examples very closely related to yours. That's because of issues discussed here. The correct number of df to associate with a fixed effect in a LME model is an issue of some dispute.
A GLS model, as you note, uses the within-subject structure only to define the form of a covariance matrix, which is taken to be known. Thereafter the analysis proceeds similarly to a linear regression, and as the Wikipedia page notes the GLS model can be thought of as a standard linear regression on linearly transformed observations. So your GLS model starts with 16 observations, takes one away for each of the intercept and the slope, and has 14 df left.
Is that correct? You certainly can make an argument that it's the number of independent observations rather than the total number of observations that should matter for calculating the df. Some aspect of the intra-subject correlations is captured in the form of the covariance matrix assumed in the GLS. After that is taken into account, just how much are the repeated observations on the same individuals dependent versus independent? I think we are back to some of the same problems that arise with defining the df for a LME model.
Note that the F-test statistic value reported by the GLS model in your example is, correctly, the square of the corresponding t-statistic value. Thus if you want to use the GLS structure and think that the df value it reports represents "fake replication," you could simply use the statistic reported by GLS and adjust the number of degrees of freedom appropriately when you do the significance testing. I am not sufficiently familiar with this to know how much to adjust the df, however.
Your choice between LME and GLS modeling should be based on what you understand about the structure of your data and your primary interest in modeling. As Pinheiro and Bates put it on pages 254-255 of Mixed-Effects Models in S and S-PLUS after comparing LME and GLS models with the same fixed-effect structure:
A mixed-effects model has a hierarchical structure which, in many applications, provides a more intuitive way of accounting for within-group dependency than the direct modeling of the marginal variance–covariance structure of the response in the gls approach ... The gls model focuses on marginal inference and is more appealing when a hierarchical structure for the data is not believed to be present, or is not relevant in the analysis, and one is more interested in parameters associated with the error variance–covariance structure, as in time-series analysis and spatial statistics.
Best Answer
By GLS do you mean GLM? The GLM is a method of iteratively reweighted least squares which takes the mean-variance relationship into account when estimating the model parameters. Generalized Least Squares will still either suffer from an overfitting issue (infinite weights), or overprediction (fitted probabilities greater than 1 or less than 0). The logistic regression model is commonly used to test for associations with binary outcomes. It's possible to go further and use Generalized Linear Mixed Models (GLMMs), conditional logistic regression, or Generalized Estimating Equations (GEEs) to account for certain correlation structures in the data. The nlme package has the mixed models, survival has clogit, and the geese package for GEEs.