Mixed-Model Analysis – Ignoring Dependencies Between Observations Inflates Standard Errors

clustered-standard-errorsgeneralized-estimating-equationsmixed modelrepeated measures

In the introductions to books and review papers about modeling dependent (a.k.a., within-subjects, repeated-measures, clustered, longitudinal, hierarchical) data, it is very common to say that ignoring these dependencies increase Type I error and underestimate the standard error:

When researchers apply standard statistical methods to multilevel data… the assumption of independent errors is violated. For example, if we have achievement test scores from a sample of students who attend several different schools, it would be reasonable to believe that those attending the same school will have scores that are more highly correlated with one another than they are with scores from students at other schools. This within-school correlation would be due, for example, to a community, a common set of teachers, a common teaching curriculum, a single set of administrative policies, and other factors. The within-school correlation will in turn result in an inappropriate estimate of the of the standard errors for the model parameters, which will lead to errors of statistical inference, such as p-values smaller than they should be and the resulting rejection of null effects above the stated Type I error rate for the parameters… An under-estimation of the standard error will cause an overestimation of the test statistic, and thus the statistical significance for the parameter in cases where it should not be, that is, Type I errors at a higher rate than specified. Indeed, the underestimation of the standard error will occur unless $\tau^2$ is equal to 0.

Multilevel Modeling Using R. Finch, Bolin, & Kelley (2014)

But consider a more specific quotation:

The modified sandwich estimate of variance is robust to any type of correlation within panels. Many people believe that the adjective robust means that sandwich estimates of variance are larger than naive estimates of variance. This is not the case. A robust variance estimate may result in smaller or larger estimators depending on the nature of the within-panel correlation. The calculation of the modified sandwich estimate of variance uses the sums of the residuals from each panel. If the residuals are negatively correlated and the sums are small, the modified sandwich estimate of variance produces smaller standard errors than the naive estimator.

Generalized Estimating Equations. Hardin & Hilbe (2013)

However, when I simulate data where the relationship between observations is strong and positive, ignoring the dependency results in a larger standard error. Why is this? See simulation below.


library(simstudy)
library(geepack)
set.seed(1839)
results <- lapply(1:5000, function(zzz) {
  dat <- genCorGen(n = 1000, nvars = 2, params1 = c(.50, .53), 
                   dist = "binary", rho = .80, corstr = "cs", wide = FALSE)
  gee_mod <- geeglm(X ~ period, binomial, dat, id = id, corstr = "exchangeable")
  glm_mod <- glm(X ~ period, binomial, dat)
  c(
    se_gee = summary(gee_mod)$coef[2, 2],
se_glm = summary(glm_mod)$coef[2, 2]
  )
})
results <- as.data.frame(do.call(rbind, results))
sapply(results, mean)

This returns:

    se_gee     se_glm 
0.05741054 0.08956997 

Best Answer

This depends on the type of effects you're looking at. That is, when you ignore the correlations,

  • the standard errors of between-subjects effects are underestimated, and
  • the standard errors of within-subjects effects are overestimated.

To see this more clearly, check the following example:

set.seed(1234)
n <- 100 # number of subjects
K <- 8 # number of measurements per subject
t_max <- 15 # maximum follow-up time

# we constuct a data frame with the design: 
# everyone has a baseline measurment, and then measurements at random follow-up times
DF <- data.frame(id = rep(seq_len(n), each = K),
                 time = c(replicate(n, c(0, sort(runif(K - 1, 0, t_max))))),
                 sex = rep(gl(2, n/2, labels = c("male", "female")), each = K))

# design matrix for the fixed effects
X <- model.matrix(~ sex + time, data = DF)

betas <- c(10, 2, 3) # fixed effects coefficients
sigma_b <- 4 # sd random intercepts
sigma <- 1 # sd error terms

# we simulate random effects
b <- rnorm(n, 0, sigma_b)
# linear predictor
eta_y <- drop(X %*% betas + b[DF$id])
# we simulate binary longitudinal data
DF$y <- rnorm(n * K, mean = eta_y, sd = sigma)

##########################################################################################

library("lme4")
#> Loading required package: Matrix

coef(summary(lmer(y ~ sex + time + (1 | id), data = DF)))
#>             Estimate  Std. Error    t value
#> (Intercept) 9.969814 0.625009140  15.951469
#> sexfemale   1.681844 0.881108533   1.908782
#> time        2.982257 0.007682251 388.200903

coef(summary(lm(y ~ sex + time, data = DF)))
#>              Estimate Std. Error   t value      Pr(>|t|)
#> (Intercept) 10.185091 0.31010280 32.844242 2.768582e-150
#> sexfemale    1.699771 0.31608051  5.377652  9.922407e-08
#> time         2.948972 0.03329383 88.574153  0.000000e+00

The standard error for the between-subject effect sex is underestimated, whereas for the within-subjects time effect is overestimated.