Solved – Design Matrix of One-Way ANOVA

anovaself-study

Just a note: this is a homework question, so feel free to prod me towards the answer if you want 🙂 Also, I'm pretty bad at statistics so sorry in advance if I'm stupid :/

I'm asked to write the "differential effects" version of a one-way ANOVA, that is:

$ Y_{i,j} = \mu + \alpha_j + \epsilon_{i,j} $

Given $\mu$ is the overall mean, $\sum_{j=1}^{k}\alpha_j = 0 $, and $ \epsilon_{i,j} \sim Normal(0, \sigma^2) $

as a linear model:

$ Y = A\beta + \epsilon $

Also, there are $ k = 4 $ levels, 2 observations per level and the design matrix can only contain elements from $ { -1, 0, 1 }.

This wikipedia article gives something that looks like what I'm looking for, but from what I can tell, it doesn't fulfill the constraint of $\sum_{j=1}^{k}\alpha_j = 0 $.

I want to say the answer is:

$$
\begin{bmatrix}
y_{1,1} \\
y_{1,2} \\
y_{2,1} \\
y_{2,2} \\
y_{3,1} \\
y_{3,2} \\
y_{4,1} \\
y_{4,2}
\end{bmatrix} =
\begin{bmatrix}
1 & 0 & 0 & 0 \\
1 & 0 & 0 & 0 \\
0 & 1 & 0 & 0 \\
0 & 1 & 0 & 0 \\
0 & 0 & 1 & 0 \\
0 & 0 & 1 & 0 \\
0 & 0 & 0 & 1 \\
0 & 0 & 0 & 1
\end{bmatrix}
\times
\begin{bmatrix}
\mu + \alpha_1 \\
\mu + \alpha_2 \\
\mu + \alpha_3 \\
\mu + \alpha_4 \\
\end{bmatrix}
+
\begin{bmatrix}
\epsilon_{1,1} \\
\epsilon_{1,2} \\
\epsilon_{2,1} \\
\epsilon_{2,2} \\
\epsilon_{3,1} \\
\epsilon_{3,2} \\
\epsilon_{4,1} \\
\epsilon_{4,2}
\end{bmatrix}
$$

But that seems too straight-forward… I've not really done anything with the design matrix, and I haven't used any $-1$'s (although I'm not really sure when you'd have to).

Is there something else it could be? Is there some other thing they could be asking for?

Best Answer

You are seeking something encoding of the design matrix called "sum-to-zero" coding. This means we are seeking effects from the grand mean, i.e. how the groups differ from the grand mean. This is in contrast with the usual "dummy" coding (the coding in the wikipedia link) interpretation which seeks effects from a a base treatment group.

$$ \begin{bmatrix} y_{1,1} \\ y_{1,2} \\ y_{2,1} \\ y_{2,2} \\ y_{3,1} \\ y_{3,2} \\ y_{4,1} \\ y_{4,2} \end{bmatrix} = \begin{bmatrix} 1 & 0 & 0 \\ 1 & 0 & 0 \\ 0 & 1 & 0 \\ 0 & 1 & 0 \\ 0 & 0 & 1 \\ 0 & 0 & 1 \\ -1 & -1 & -1 \\ -1 & -1 & -1 \end{bmatrix} \times \begin{bmatrix} \mu+\alpha_1 \\ \mu+\alpha_2 \\ \mu+\alpha_3 \\ \end{bmatrix} + \begin{bmatrix} \epsilon_{1,1} \\ \epsilon_{1,2} \\ \epsilon_{2,1} \\ \epsilon_{2,2} \\ \epsilon_{3,1} \\ \epsilon_{3,2} \\ \epsilon_{4,1} \\ \epsilon_{4,2} \end{bmatrix} $$

The constraint $\sum^k_{j=1} a_j=0$ is equivalent to $a_{base}=0$ in "dummy" coding. The purpose of the sum-constraint is used is for identifiability of the coefficient estimates. If we estimated without this sum-constraint and use the encoding above, we would find that there are infinitely many solutions to the estimation of $\mu$, $\alpha_1$, $\alpha_2$ and $\alpha_3$. See this link here for a clear mathematical explanation by example Sum-to-zero constraint in one-way ANOVA. A different constraint leads to a different encoding.

Related Question