Solved – Why are all regression predictors in a balanced factorial ANOVA orthogonal

anovainteractionlinear modelmultiple regressionregression

In a balanced factorial ANOVA, when it is understood as a linear model, all categorical predictors (all main effects and all interactions of all orders) are mutually orthogonal, which leads to many nice properties like unique decomposition of variance. Orthogonality is completely obvious for a one-way ANOVA (from the way dummy coding works: each column in the design matrix has ones where other columns have zeros), and sort of makes sense for factorial ANOVA as well, but I would like to see either a formal proof, or some intuition, or, ideally, both.

I consulted Rutherford, 2001, Introducing Anova and Ancova, GLM approach (pdf), but weirdly could not find this explicitly discussed.


Update

Turns out, I don't completely understand how the dummy coding for ANOVA interactions works. Below is an explicit example from Christen's book Plane answers to complex questions. Five ($3+2$) columns corresponding to main effects are mutually orthogonal (this is obvious for the first $3$ columns and for the next $2$ columns corresponding to one factor, but less not so obvious across those). But six ($3\cdot2$) further columns corresponding to interactions are actually not orthogonal to them. E.g. columns $2$ and $7$ are not orthogonal. On the other hand, I know that in a balanced design everything should be orthogonal, hence my confusion.

ANOVA coding

Best Answer

I have started to write an article about the case of the balanced one-way ANOVA model. This article is still under construction.

I'll try to explain the ideas (not all the details) below. The idea consists in using the tensor product as a convenient language to treat the balanced one-way or multi-way ANOVA models. This could seem complicated at first glance for one who doesn't know the tensor product but after a bit of efforts this becomes very mechanic. I refer to my blog for the definition of the tensor product.

As you have seen, orthogonality only appears as the result of technical calculations when using the matricial approach. Orthogonality is crystal clear using the vector space approach with the tensor product.

One-way ANOVA

Assume for the sake of simplicity that you have $I=2$ groups and $J=3$ observations in each group. Thus you have a rectangular dataset: $$ y = \begin{pmatrix} y_{11} & y_{12} & y_{13} \\ y_{21} & y_{22} & y_{23} \end{pmatrix} $$ and you assume a model $$ \begin{pmatrix} y_{11} & y_{12} & y_{13} \\ y_{21} & y_{22} & y_{23} \end{pmatrix} = \begin{pmatrix} \mu_1 & \mu_1 & \mu_1 \\ \mu_2 & \mu_2 & \mu_2 \end{pmatrix} + \sigma \begin{pmatrix} \epsilon_{11} & \epsilon_{12} & \epsilon_{13} \\ \epsilon_{21} & \epsilon_{22} & \epsilon_{23} \end{pmatrix} $$ with $\epsilon_{ij} \sim_{\text{iid}} {\cal N}(0,1)$.

The general form a linear Gaussian model is usually written in stacked form $\boxed{y=\mu + \sigma \epsilon}$ with $y$ is vector-valued (say in $\mathbb{R}^n$), $\mu$ is assumed to lie in a linear subspace of $\mathbb{R}^n$ (say $W$) and $\epsilon$ is a vector of $\epsilon_{k} \sim_{\text{iid}} {\cal N}(0,1)$. Here this would be $n=IJ=6$ and for example $$\begin{pmatrix} y_{11} \\ y_{12} \\ y_{13} \\ y_{21} \\ y_{22} \\ y_{23} \end{pmatrix} = \begin{pmatrix} \mu_{1} \\ \mu_{1} \\ \mu_{1} \\ \mu_{2} \\ \mu_{2} \\ \mu_{2} \end{pmatrix} + \sigma \begin{pmatrix} \epsilon_{11} \\ \epsilon_{12} \\ \epsilon_{13} \\ \epsilon_{21} \\ \epsilon_{22} \\ \epsilon_{23} \end{pmatrix}$$ and the rectangular structure is lost, and this is not the way to go.

Actually the cleanest way to treat the balanced one-way ANOVA model consists in using the tensor product $\mathbb{R}^I \otimes \mathbb{R}^J$ instead of $\mathbb{R}^n$. The notion of tensor product is not usually adressed in elementary course on linear algebra, but it is not complicated and we will use it only as a convenient language to adress the problem.

First of all, the linear space $W$ in which $\mu$ is assumed to lie has a very convenient form with the tensor product language. The tensor product $x \otimes y$ is also defined for two vectors $x$ and $y$ and with this elementary operation, one has $$ \mu = \begin{pmatrix} \mu_1 & \mu_1 & \mu_1 \\ \mu_2 & \mu_2 & \mu_2 \end{pmatrix} = (\mu_1, \mu_2) \otimes (1,1,1) \in \boxed{W:= \mathbb{R}^I \otimes [(1,1,1)]}, $$ where $[(1,1,1)]$ denotes the vector space spanned by $(1,1,1) \in \mathbb{R}^J$.

Now let me denote ${\bf 1}_J = (1,1,1) \in \mathbb{R}^J$ and ${\bf 1}_I = (1,1) \in \mathbb{R}^I$. The orthogonal parameters you're talking about are $m$ and the $\alpha_i$ defined by $$\boxed{\mu_i = m + \alpha_i} \quad \text{with } \sum_{i=1}^I\alpha_i=0.$$ The vector space $\mathbb{R}^I$ has the ortogonal decomposition $\mathbb{R}^I=[{\bf 1}_I]\oplus{[{\bf 1}_I]}^\perp$, therefore $W$ has the orthogonal decomposition $$\boxed{W= \mathbb{R}^I \otimes [{\bf 1}_J] = \Bigl([{\bf 1}_I] \otimes [{\bf 1}_J]\Bigr) \oplus \Bigl({[{\bf 1}_I]}^\perp \otimes [{\bf 1}_J]\Bigr)}.$$ (we use the distributivity rule $(A \oplus B) \otimes C= (A \otimes C) \oplus (B \otimes C)$ which is elementary derived from the definitions of the tensor product).

Then the parameters $m$ and $\alpha_i$ appears in the orthogonal decomposition of $\mu$: $$\begin{align*} \mu = (\mu_1, \ldots, \mu_I) \otimes {\bf 1}_J & = \begin{pmatrix} m & m & m \\ m & m & m \end{pmatrix} + \begin{pmatrix} \alpha_1 & \alpha_1 & \alpha_1 \\ \alpha_2 & \alpha_2 & \alpha_2 \end{pmatrix} \\ & = \underset{\in \bigl([{\bf 1}_I]\otimes[{\bf 1}_J]\bigr)}{\underbrace{m({\bf 1}_I\otimes{\bf 1}_J)}} + \underset{\in \bigl([{\bf 1}_I]^{\perp}\otimes[{\bf 1}_J] \bigr)}{\underbrace{(\alpha_1,\ldots,\alpha_I)\otimes{\bf 1}_J}} \end{align*}$$ and this is why orthogonality occurs, because the least-squares estimates are obtained by projecting $y$ on $W$.

Two-way ANOVA without interaction

For the two-way ANOVA without interaction, one assumes $y_{ij} \sim {\cal N}(\mu_{ij}, \sigma^2)$ and the $\mu_{ij}$ have form $$\mu_{ij} = m + \alpha_i + \beta_j, \quad \sum_{i=1}^I\alpha_i=0, \quad \sum_{j=1}^J\beta_j=0.$$ Consider the orthogonal decompositions $\mathbb{R}^I =[{\bf 1}_I]\oplus{[{\bf 1}_I]}^\perp$ and $\mathbb{R}^J =[{\bf 1}_J]\oplus{[{\bf 1}_J]}^\perp$. Then we get the orthogonal decomposition $$\mathbb{R}^I\otimes \mathbb{R}^J = \underset{=:W}{\underbrace{\Bigl([{\bf 1}_I] \otimes [{\bf 1}_J]\Bigr) \oplus \Bigl({[{\bf 1}_I]}^\perp \otimes [{\bf 1}_J]\Bigr) \oplus \Bigl([{\bf 1}_I] \otimes {[{\bf 1}_J]}^\perp\Bigr)}} \oplus \underset{=W^\perp}{\underbrace{\Bigl({[{\bf 1}_I]}^\perp \otimes {[{\bf 1}_J]}^\perp\Bigr)}}. $$

This is the origin of orthonormality, similarly to the case of the balanced one-way ANOVA model.

Two-way ANOVA with interaction (and replication)

Here, $y_{ijk} \sim {\cal N}(\mu_{ij}, \sigma^2)$ and the $\mu_{ij}$ have form $$\mu_{ij} = m + \alpha_i + \beta_j + \gamma_{ij}, \quad \sum_{i=1}^I\alpha_i=0, \quad \sum_{j=1}^J\beta_j=0, \\ \sum_{i=1}^I\gamma_{ij}=0 \text{ for every $j$}, \quad \sum_{j=1}^J\gamma_{ij}=0 \text{ for every $i$}.$$

Here we have to orthogonally decompose $\mathbb{R}^I\otimes \mathbb{R}^J \otimes \mathbb{R}^K$ by distributing the three orthogonal decompositions $$\mathbb{R}^I =[{\bf 1}_I]\oplus{[{\bf 1}_I]}^\perp, \quad \mathbb{R}^J =[{\bf 1}_J]\oplus{[{\bf 1}_J]}^\perp, \quad \mathbb{R}^K =[{\bf 1}_K]\oplus{[{\bf 1}_K]}^\perp.$$ By doing it, we find an orthogonal decomposition of $W=\mathbb{R}^I\otimes\mathbb{R}^J\otimes [{\bf 1}_K]$ into the following parts:

  • $[{\bf 1}_I] \otimes [{\bf 1}_J] \otimes [{\bf 1}_K]$ corresponding to $m$

  • ${[{\bf 1}_I]}^\perp \otimes [{\bf 1}_J] \otimes [{\bf 1}_K]$ corresponding to the $\alpha_i$

  • $[{\bf 1}_I] \otimes {[{\bf 1}_J]}^\perp \otimes [{\bf 1}_K]$ corresponding to the $\beta_j$

  • ${[{\bf 1}_I]}^\perp \otimes {[{\bf 1}_J]}^\perp \otimes [{\bf 1}_K]$ corresponding to the $\gamma_{ij}$

Related Question