ANOVA – What is Compound Symmetry Explained in Plain English?

anovacorrelationlme4-nlmemixed model

I recently realized that a mixed-model with only subject as a random factor and the other factors as fixed factors is equivalent to an ANOVA when setting the correlational structure of the mixed model to compound symmetry.

Therefore I would like to know what does compound symmetry mean in the context of a mixed (i.e., split-plot) ANOVA, at best explained in plain English.

Besides compound symmetry lme offers other types of correlational structures, such as

corSymm general correlation matrix, with no additional structure.

or different types of spatial correlation.

Therefore, I have the related question on which other types of correlational structures may be advisable to use in the context of designed experiments (with between- and within-subjects factors)?

It would be great if answers could point to some references for different correlational structures.

Best Answer

Compound symmetry is essentially the "exchangeable" correlation structure, except with a specific decomposition for the total variance. For example, if you have mixed model for the subject $i$ in cluster $j$ response, $Y_{ij}$, with only a random intercept by cluster

$$ Y_{ij} = \alpha + \gamma_{j} + \varepsilon_{ij} $$

where $\gamma_{j}$ is the cluster $j$ random effect with variance $\sigma^{2}_{\gamma}$ and $\varepsilon_{ij}$ is the subject $i$ in cluster $j$ "measurement error" with variance $\sigma^{2}_{\varepsilon}$ and $\gamma_{j}, \varepsilon_{ij}$ are independent. This model implicitly specifies the compound symmetry covariance matrix between observations in the same cluster:

$$ {\rm cov}(Y_{ij}, Y_{kj}) = \sigma^{2}_{\gamma} + \sigma^{2}_{\varepsilon} \cdot \mathcal{I}(k = i) $$

Note that the compound symmetry assumption implies that the correlation between distinct members of a cluster is $\sigma^{2}_{\gamma}/(\sigma^{2}_{\gamma} + \sigma^{2}_{\varepsilon})$.

In "plain english" you might say this covariance structure implies that all distinct members of a cluster are equally correlated with each other and the total variation, $\sigma^{2} = \sigma^{2}_{\gamma} + \sigma^{2}_{\varepsilon}$, can be partitioned into the "shared" (within a cluster) component, $\sigma^{2}_{\gamma}$ and the "unshared" component, $\sigma^{2}_{\varepsilon}$.

Edit: To aid understanding in the "plain english" sense, consider an example where individuals are clustered within families so that $Y_{ij}$ denotes the subject $i$ in family $j$ response. In this case the compound symmetry assumption means that the total variation in $Y_{ij}$ can be partitioned into the variation within a family, $\sigma^{2}_{\varepsilon}$, and the variation between families, $\sigma^{2}_{\gamma}$.

Related Solutions

Solved – How to analyse repeated measure ANOVA with three or more conditions presented in randomised order

Repeated measures is kind of an overloaded term. To some people it refers to a particular statistical analysis method; to others it refers to the structure of the design.

This is a variant on a three period, three treatment crossover design.

It is a variant because usually in a crossover design you randomize subjects to sequences. In this case the sequence is determined randomly for each subject. Since there are six possible sequences, it might be that some sequences are not observed, especially with 10 subjects. Maybe this is formally the same as randomizing subjects to sequences, but I haven't looked at that yet.

The considerations for crossover designs are:

Carryover effects: Also known as residual effects, where prior treatment may affect response to current treatment. The goal of the washout periods is to remove this from consideration. You could also have (in theory) second-order residual effects, where the treatment given in the first period potentially affects the response to treatment given in the third period.
Period effects: Response to treatment(s) may change as the study goes on for a given subject.
Autocorrelation: Serial correlation in errors is usually an issue with more closely measured data. In simple balanced designs, having a random effect for subject is going to imply equal correlaation of errors from each subject.
Subject effects: Subjects may differ in mean response from each other regardless of treatments. You could conceive of a situation where measurement error was serially correlated separate from a random subject effect.
Sequence effect: In cases where you randomize subjects to sequences, subjects are considered nested in sequence.

A minimal analysis for this would be the suggested randomized complete block design. That is, a fixed effect for treatment and a random effect for subject. With a skimpy sample size that might be all you can really do.

I would argue for a bit more structure to the analysis, if possible. Assuming no carryover effects on scientific grounds, it seems like a good idea to have at fixed effects for treatment, period, and treatment $\times$ period interaction, and a random effect for subjects. For small data sets, if this model can't be fit, I would drop the treatment $\times$ period interaction first.

Period should be included because it represents a restriction on the randomization. You cannot "randomize" periods --- they always happen in the same order. Treatment $\times$ period interaction might be indicative of some sort of carryover effect.

With tons of data one could work up terms that would allow estimation of various specific carryover effects. My notes on this are gone, though I know I've seen it handled in some texts.

The strategy of additionally modelling the correlation structure on the R-side seems reasonable to me. That allows one to claim that one is handling the possible dependence structure induced by repeated measures on the same subject, which I would also probably claim about the random effect for subject if the analysis devolved to that level... It is also nice if various analysis strategies provide broadly or very similar results.

For implementation, I'd use PROC MIXED in SAS and likely nlme or lme4 in R.

I'll punt on the compound symmetry question, since that seems more like a holdover from the days where MANOVA was the only "correct" analysis for repeated measures.

Paired t-test – Special Case of Linear Mixed-Effect Modeling

The equivalence of the models can be observed by calculating the correlation between two observations from the same individual, as follows:

As in your notation, let $Y_{ij} = \mu + \alpha_i + \beta_j + \epsilon_{ij}$, where $\beta_j \sim N(0, \sigma_p^2)$ and $\epsilon_{ij} \sim N(0, \sigma^2)$. Then $Cov(y_{ik}, y_{jk}) = Cov(\mu + \alpha_i + \beta_k + \epsilon_{ik}, \mu + \alpha_j + \beta_k + \epsilon_{jk}) = Cov(\beta_k, \beta_k) = \sigma_p^2$, because all other terms are independent or fixed, and $Var(y_{ik}) = Var(y_{jk}) = \sigma_p^2 + \sigma^2$, so the correlation is $\sigma_p^2/(\sigma_p^2 + \sigma^2)$.

Note that the models however are not quite equivalent as the random effect model forces the correlation to be positive. The CS model and the t-test/anova model do not.

EDIT: There are two other differences as well. First, the CS and random effect models assume normality for the random effect, but the t-test/anova model does not. Secondly, the CS and random effect models are fit using maximum likelihood, while the anova is fit using mean squares; when everything is balanced they will agree, but not necessarily in more complex situations. Finally, I'd be wary of using F/df/p values from the various fits as measures of how much the models agree; see Doug Bates's famous screed on df's for more details. (END EDIT)

The problem with your R code is that you're not specifying the correlation structure properly. You need to use gls with the corCompSymm correlation structure.

Generate data so that there is a subject effect:

set.seed(5)
x <- rnorm(10)
x1<-x+rnorm(10)
x2<-x+1 + rnorm(10)
myDat <- data.frame(c(x1,x2), c(rep("x1", 10), rep("x2", 10)), 
                    rep(paste("S", seq(1,10), sep=""), 2))
names(myDat) <- c("y", "x", "subj")

Then here's how you'd fit the random effects and the compound symmetry models.

library(nlme)
fm1 <- lme(y ~ x, random=~1|subj, data=myDat)
fm2 <- gls(y ~ x, correlation=corCompSymm(form=~1|subj), data=myDat)

The standard errors from the random effects model are:

m1.varp <- 0.5453527^2
m1.vare <- 1.084408^2

And the correlation and residual variance from the CS model is:

m2.rho <- 0.2018595
m2.var <- 1.213816^2

And they're equal to what is expected:

> m1.varp/(m1.varp+m1.vare)
[1] 0.2018594
> sqrt(m1.varp + m1.vare)
[1] 1.213816

Other correlation structures are usually not fit with random effects but simply by specifying the desired structure; one common exception is the AR(1) + random effect model, which has a random effect and AR(1) correlation between observations on the same random effect.

EDIT2: When I fit the three options, I get exactly the same results except that gls doesn't try to guess the df for the term of interest.

> summary(fm1)
...
Fixed effects: y ~ x 
                 Value Std.Error DF   t-value p-value
(Intercept) -0.5611156 0.3838423  9 -1.461839  0.1778
xx2          2.0772757 0.4849618  9  4.283380  0.0020

> summary(fm2)
...
                 Value Std.Error   t-value p-value
(Intercept) -0.5611156 0.3838423 -1.461839  0.1610
xx2          2.0772757 0.4849618  4.283380  0.0004

> m1 <- lm(y~ x + subj, data=myDat)
> summary(m1)
...
            Estimate Std. Error t value Pr(>|t|)   
(Intercept)  -0.3154     0.8042  -0.392  0.70403   
xx2           2.0773     0.4850   4.283  0.00204 **

(The intercept is different here because with the default coding, it's not the mean of all subjects but instead the mean of the first subject.)

It's also of interest to note that the newer lme4 package gives the same results but doesn't even try to compute a p-value.

> mm1 <- lmer(y ~ x + (1|subj), data=myDat)
> summary(mm1)
...
            Estimate Std. Error t value
(Intercept)  -0.5611     0.3838  -1.462
xx2           2.0773     0.4850   4.283

Best Answer

Related Solutions

Solved – How to analyse repeated measure ANOVA with three or more conditions presented in randomised order

Paired t-test – Special Case of Linear Mixed-Effect Modeling

Related Question