Solved – Degrees of freedom in ANOVA

anovadegrees of freedom

Is there an algorithm to calculate the degrees of freedom for any given effect or interaction as well as the 'error' in any ANOVA design with or without repeated measurements, such that I don't have to look up every time or memorize?

Edit: Of course, I know df are calculated and printed by statistical analysis software, the question refers to situations in which I want to calculate the df in my head.

Best Answer

Yes!

Important technical note: For the rules stated below, define the "number of levels" for each factor to be the number of levels of that factor per each level of any/all factors that one is nested in. For example, if factor A is nested in factor B, then the "number of levels" for B is just the total number of levels for B, but the "number of levels" for A is the number of levels of A per each level of B, or equivalently, the total number of levels for A divided by the total number of levels for B. This is a standard convention in the ANOVA literature.

The rules are:

  1. For main effects that are not nested in any other factors, the DF is the number of levels minus 1.
  2. For main effects that are nested in other factors, the DF is the number of levels minus 1, times the product of the numbers of levels of all factors this one is nested in.
  3. For interactions, the DF is the product of the DFs of the factors comprising the interaction.
  4. For the error variance, the DF is the product of the numbers of levels of all factors, times the number of replicates minus 1. (This can be viewed as an extension of rule #2, where we view the errors as being an additional "factor" that is nested in all the other factors).

Here is an example (the names of factors are capitalized). We have an industrial experiment where a bunch of Workers each use a set of Machines, and the Machines are one of several Types. Machine is nested in Type, Worker is crossed with Machine, and therefore Worker is also crossed with Type. Let the lower-case version of each factor name represent the number of levels for that factor per level of any containing factors. So $w$ is the number of Workers, $m$ is the number of Machines per Type, and $t$ is the number of Types. Also let $r$ be the number of replicates, i.e., the number of observations in each individual cell of the design. Then applying the rules above, the DFs are:

  • Worker: $w-1$ (by rule #1)
  • Machine: $t(m-1)$ (by rule #2)
  • Type: $t-1$ (by rule #1)
  • Worker*Machine: $t(m-1)(w-1)$ (by rule #3)
  • Worker*Type: $(w-1)(t-1)$ (by rule #3)
  • Error: $wmt(r-1)$ (by rule #4)

Note that in this experiment, if there is only a single replicate $(r=1)$, the DF for error is 0. In this case the Error and Worker*Machine effects are confounded and cannot be separately estimated. So we would usually just consider the Worker*Machine effects to be Error.