Solved – Degrees of freedom in ANOVA

anovadegrees of freedom

Is there an algorithm to calculate the degrees of freedom for any given effect or interaction as well as the 'error' in any ANOVA design with or without repeated measurements, such that I don't have to look up every time or memorize?

Edit: Of course, I know df are calculated and printed by statistical analysis software, the question refers to situations in which I want to calculate the df in my head.

Best Answer

Yes!

Important technical note: For the rules stated below, define the "number of levels" for each factor to be the number of levels of that factor per each level of any/all factors that one is nested in. For example, if factor A is nested in factor B, then the "number of levels" for B is just the total number of levels for B, but the "number of levels" for A is the number of levels of A per each level of B, or equivalently, the total number of levels for A divided by the total number of levels for B. This is a standard convention in the ANOVA literature.

The rules are:

For main effects that are not nested in any other factors, the DF is the number of levels minus 1.
For main effects that are nested in other factors, the DF is the number of levels minus 1, times the product of the numbers of levels of all factors this one is nested in.
For interactions, the DF is the product of the DFs of the factors comprising the interaction.
For the error variance, the DF is the product of the numbers of levels of all factors, times the number of replicates minus 1. (This can be viewed as an extension of rule #2, where we view the errors as being an additional "factor" that is nested in all the other factors).

Here is an example (the names of factors are capitalized). We have an industrial experiment where a bunch of Workers each use a set of Machines, and the Machines are one of several Types. Machine is nested in Type, Worker is crossed with Machine, and therefore Worker is also crossed with Type. Let the lower-case version of each factor name represent the number of levels for that factor per level of any containing factors. So $w$ is the number of Workers, $m$ is the number of Machines per Type, and $t$ is the number of Types. Also let $r$ be the number of replicates, i.e., the number of observations in each individual cell of the design. Then applying the rules above, the DFs are:

Worker: $w-1$ (by rule #1)
Machine: $t(m-1)$ (by rule #2)
Type: $t-1$ (by rule #1)
Worker*Machine: $t(m-1)(w-1)$ (by rule #3)
Worker*Type: $(w-1)(t-1)$ (by rule #3)
Error: $wmt(r-1)$ (by rule #4)

Note that in this experiment, if there is only a single replicate $(r=1)$, the DF for error is 0. In this case the Error and Worker*Machine effects are confounded and cannot be separately estimated. So we would usually just consider the Worker*Machine effects to be Error.

Related Solutions

Solved – How to assign degrees of freedom for two-way ANOVA with two within-subjects factors

I'm not sure I understand the question exactly, but if you are asking about the df for the two-way, factorial, within-subjects ANOVA, here they are:

A = a - 1, where a = number of levels of A
B = b - 1, where b = number of levels of B
A x B = (a - 1)(b - 1)
S = n - 1, where s = number of levels of S (i.e., number of subjects)
A x S = (a - 1)(n - 1)
B x S = (b - 1)(n - 1)
A x B x S = (a - 1)(b - 1)(n - 1)

E.g.:

A = cond (a = 3); B = rnd (b = 6); S (s = 44)
- df_A = 2
- df_B = 5
- df_{A x B} = 10
- df_S = 43
- df_{A x S} = 86
- df_{B x S} = 215
- df_{A x B x S} = 430

Solved – Are degrees of freedom in lmerTest::anova correct? They are very different from RM-ANOVA

I think that lmerTest is getting it right and ezanova is getting it wrong in this case.

the results from lmerTest agree with my intuition/understanding
two different computations in lmerTest (Satterthwaite and Kenward-Roger) agree
they also agree with nlme::lme
when I run it, ezanova gives a warning, which I don't entirely understand, but which should not be disregarded ...

Re-running example:

library(ez); library(lmerTest); library(nlme)
data(ANT)
ANT.2 <- subset(ANT, !error)
set.seed(101)  ## for reproducibility
baseline.shift <- rnorm(length(unique(ANT.2$subnum)), 0, 50)
ANT.2$rt <- ANT.2$rt + baseline.shift[as.numeric(ANT.2$subnum)]

Figure out experimental design

with(ANT.2,table(subnum,group,direction))

So it looks like individuals (subnum) are placed in either control or treatment groups, and each is tested for both directions -- i.e. direction can be tested within individuals (denominator df is large), but group and group:direction can only be tested among individuals

(anova.ez <- ezANOVA(data = ANT.2, dv = .(rt), wid = .(subnum), 
    within = .(direction), between = .(group)))
## $ANOVA
##            Effect DFn DFd         F          p p<.05          ges
## 2           group   1  18 2.4290721 0.13651174       0.1183150147
## 3       direction   1  18 0.9160571 0.35119193       0.0002852171
## 4 group:direction   1  18 4.9169156 0.03970473     * 0.0015289914

Here I get Warning: collapsing data to cell means. *IF* the requested effects are a subset of the full design, you must use the "within_full" argument, else results may be inaccurate. The denominator DF look a little funky (all equal to 18): I think they should be larger for direction and group:direction, which can be tested independently (but would be smaller if you added (direction|subnum) to the model)?

# similarly with lmer and lmerTest::anova
model <- lmer(rt ~ group * direction + (1 | subnum), data = ANT.2)
lmerTest::anova(model)
##                 Df  Sum Sq Mean Sq F value Denom Pr(>F)
## group            1 12065.7 12065.7  2.4310    18 0.1364
## direction        1  1952.2  1952.2  0.3948  5169 0.5298
## group:direction  1 11552.2 11552.2  2.3299  5169 0.1270

the Df column here refers to the numerator df, Denom (second-to-last) gives the estimated denominator df; they agree with the classical intuition. More important, we also get different answers for the F values ...

We can also double-check with Kenward-Roger (very slow because it involves refitting the model several times)

lmerTest::anova(model,ddf="Kenward-Roger")

The results are identical.

For this example lme (from the nlme package) actually does a perfectly good job guessing the appropriate denominator df (the F and p-values are very slightly different):

model3 <- lme(rt ~ group * direction, random=~1|subnum, data = ANT.2)
anova(model3)[-1,]
##                 numDF denDF   F-value p-value
## group               1    18 2.4334314  0.1362
## direction           1  5169 0.3937316  0.5304
## group:direction     1  5169 2.3298847  0.1270

If I fit an interaction between direction and subnum the df for direction and group:direction are much smaller (I would have thought they would be 18, but maybe I'm getting something wrong):

model2 <- lmer(rt ~ group * direction + (direction | subnum), data = ANT.2)
lmerTest::anova(model2)
##                 Df  Sum Sq Mean Sq F value   Denom Pr(>F)
## group            1 20334.7 20334.7  2.4302  17.995 0.1364
## direction        1  1804.3  1804.3  0.3649 124.784 0.5469
## group:direction  1 10616.6 10616.6  2.1418 124.784 0.1459

Best Answer

Related Solutions

Solved – How to assign degrees of freedom for two-way ANOVA with two within-subjects factors

Solved – Are degrees of freedom in lmerTest::anova correct? They are very different from RM-ANOVA

Related Question