Solved – One way unbalanced Anova and the sum to zero constraint

anovaunbalanced-classes

I am trying to understand the unbalanced case of 1 way Anova. Suppose there are 3 groups of different sizes for the single factor say A. Then the overall mean is the WEIGHTED average of the means of the 3 groups. The one way anova then tests if the between group sum of squares where the mean of each group is compared with the WEIGHTED mean (= overall mean of all data points) significant compared to the within group sum of squares.

So the one way anova (if it is significant) tells us if at least one of the group means is higher or lower than the grand = weighted mean of the entire sample. Given the weighted mean and all the group means we can tell that at least one of them is above or below the weighted mean.

Now Suppose we try to estimate $\mu$ + $a_i$ for i=1,2,3.

We impose a constraint, say sum of $a_i = 0.$

This makes $\mu$ the UNWEIGHTED mean of the $a_i's$. My query is how does this help ? Sure we can estimate the 4 parameters $\mu,a_i's$ but this does not match with $\mu_{grand}$ the grand mean which is the weighted mean of the $a_i's$.

Given the weighted mean we can compare the a_i's to 0 to know if at least one of them is different to the weighted mean. But we cant compare with the unweighted mean because that is not what the one way anova is testing.

So then how does imposing the sum to zero constraint help ? What is the intuition that it conveys?

Best Answer

Actually the 1-way ANOVA is testing if all the means in each group are equal to each other or not. i.e in your example if $\mu + \alpha_1 = \mu +\alpha_2 = \mu +\alpha_3 $. Obviously if they are all equal then they equal the grand mean, i.e. $\mu + \alpha_i $ would be the grand mean. But $\mu$ could be anything, 1,2, 50000 etc. and the $\alpha$'s would just change. The constraint changes the parameterisation of the model. The ANOVA table, sums of squares, F-test etc. will be the same irrespective of the constraint used. If $\mu$ were the grand mean then the null hypothesis that you are testing would be the same as testing if all the $\alpha$'s are zero. There are a lot of other possible constraints we could use on the model. We could set one of the treatment parameters to zero. Suppose that is $\alpha_1$. That may seem daft, but then the $\mu$ in your model is the mean for that treatment i.e. in this case treatment 1, $\alpha_2$ is the difference in mean between treatment 1 and treatment 2 and $\alpha_3$ is the difference in mean between treatment 1 and treatment 3. This is a common version of the model in drugs trials where treatment 1 is e.g. the placebo and we want to measure how a new drug differs from the placebo. To get the mu to be the grand mean the constraint would be $\sum_{i} r_i \alpha_i = 0 $ and not $\sum_{i} \alpha_i = 0 $, where $r_i$ is the number of observations in each group. i.e. the weighted sum of the treatments would give a model. However, as said above the constraint and parameterisation you use does not alter the ANOVA results.