Solved – Repeated-measures ANOVA – degrees of freedom of residuals

anovadegrees of freedomr

I am having trouble figuring out the correct degrees of freedom of residuals for an experiment I performed:

  • within-subject experiment
  • 2×2 study design: let's call the conditions cond_A and cond_B for simplicity.
  • each condition only has two values (yes or no).
  • one dependent variable (time t)
  • 24 participants
  • each experienced every combination of cond_A and cond_B several times (how often each combination was experienced differs).
  • 487 measurements total.

I was following http://ron.dotsch.org/degrees-of-freedom/ who explained that

df2 = df_total – df_subjects – df_factor

which in my case would be 487 observations – (24 participants – 1) – (4 levels – 1) = 461.

However, if I run my calculations in R I get:

> summary(aov(t ~ (cond_A * cond_B) + Error(participant/(cond_A * cond_B)), data=anova_data))
Error: Within
                 Df Sum Sq Mean Sq F value  Pr(>F)   
cond_A            1    832     832   0.550 0.45863   
cond_B            1  11540   11540   7.629 0.00596 **
cond_A:cond_B     1    757     757   0.501 0.47954   
Residuals        479 724556    1513

Which tells me the DOF of residuals is 479.

The data set contains ALL measurements, not just averages per participant/condition.

I read here http://sherifsoliman.com/2014/12/10/ANOVA_in_R/ that aov may not always be trusted, so I also used ezANOVA to double-check my results:

> ezANOVA(anova_data, dv=t, wid=participant, within= .(cond_A, cond_B), detailed=TRUE)
          Effect DFn DFd          SSn       SSd            F            p p<.05          ges
1    (Intercept)   1  23 580516.15076 72972.543 182.97116929 1.954366e-12     * 0.8395005806
2        cond_A    1  23   3574.12976  7452.345  11.03075343 2.973947e-03     * 0.0311988224
3        cond_B    1  23     45.81809 22679.055   0.04646649 8.312302e-01       0.0004126587
4 cond_A:cond_B    1  23    218.33558  7881.692   0.63713709 4.329143e-01       0.0019633793

Now, not only are the F and p values different (which is bad enough, and I'd be happy on opinions which test to trust), but it also suggests a dof of 23.

What is the correct DOF of residuals and why are they calculated differently in these cases?

Thank you very much!

Best Answer

The df for the residuals is the product of the numbers of levels of all factors times the number of replicates minus 1.

Your problem, I suspect, is that your design seems to be unbalanced (you mentioned unequal number of observations per condition per subject) as aov is designed for balanced designs. You might be better off using lme4. However, be careful to test your treatment effects against the right error term or you will risk alpha inflation; thus, the denominator df of 23 as calculated by ezAnova is correct given that your sample size is 24 and a denominator df of 479 would be way too high anyway. If you want more precise instruction a reproducible example might be helpful.