In the example we have $c = 5$ different groups (=columns) and each group consists of $n=5$ different data points. The $MS_{column}$ is given by $MS_{column} = n \cdot \sigma^2_{between}$. Thus, we have to divide $MS_{column}$ by the sample size per group to obtain the "between group variance".
Here the R code:
data.val = c(1.5377, 0.6923, 1.6501, 3.7950, 5.6715,
2.8339, 1.5664, 6.0349, 3.8759, 3.7925,
-1.2588, 2.3426, 3.7254, 5.4897, 5.7172,
1.8622, 5.5784, 2.9369, 5.4090, 6.6302,
1.3188, 4.7694, 3.7147, 5.4172, 5.4889)
data.grp = c(rep(1:5, 5))
df = data.frame(val = data.val,
grp = factor(data.grp) )
library(ggplot2)
gg = ggplot(df, aes(x = grp, y = val, fill=grp)) +
geom_boxplot(alpha=0.5) +
xlab("Columns") +
ylab("Values")
print(gg)
aov.out = aov(val ~ grp, df)
anova.out = anova(aov.out)
print(anova.out)
##
# Calc variance within group
nGrp = length(unique(data.grp)) # different groups
var.within = vector(mode = "numeric", length = nGrp)
mean.within = vector(mode = "numeric", length = nGrp)
for ( i in 1:nGrp ){
idx = data.grp == i
var.within[[i]] = var(data.val[idx])
mean.within[[i]] = mean(data.val[idx])
}
# The average within group variance is equal to the "Residuals" Mean Sq in the anova table:
# This is Var_within = E[ Var[grp1], Var[grp2], ..., Var[grp5] ]
var.withinAverage = mean(var.within)
print(paste0("var.within = ", var.withinAverage))
# The variance between groups is given by
# Var_between = Var[ E[grp1], E[grp2], ... E[grp5] ]
var.between = var(mean.within)
print(paste0("var.between = ", var.between))
##
# To obtain the MS_between term, we have to multiply by the sample size per group:
kSample = sum(data.grp == 1) # number of elements within each group
MS.between = kSample * var.between
print(paste0("MS.between = kSample * var.between = ", MS.between))
##
# Thus, if we invert the last expression we get:
# var.between = MS.between / kSample
Which yields the following output:
Response: val
Df Sum Sq Mean Sq F value Pr(>F)
grp 4 53.723 13.4307 6.0488 0.002332 **
Residuals 20 44.408 2.2204
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
[1] "var.within = 2.2203876922"
[1] "var.between = 2.6861421478"
[1] "MS.between = kSample * var.between = 13.430710739"
The example above demonstrates
- that the anova table is constructed by decomposing the sum-of-squares (and not the variance) into two components,
- that the different elements in the anova table are not the within and between group variance components, and
- how the anova table is constructed and how it is related to the within and between group variance.
Nevertheless, it is possible to decompose the total variance into a within group variance component and a between group variance component. However, these terms are defined differently. If you like to learn more about this decomposition you will have to read about variance components.
Best Answer
In the first row of your table, you have superscripts $c, c, d, b, b, a$ for compositions $A, B, C, D, E, F,$ respectively.
The superscripts mean that $A$ and $B$ are not significant from each other in a statistical sense. Also, that $D$ and $E$ are not significantly different. However, significant differences were found between $A$ and $C,$ between $B$ and $C,$ between $E$ and $F$ (among other differences).
More generally, it is important to understand that "lack of significant difference" is not the same thing as "exact equality".
In a similar analysis of this kind with $A$ significantly different from $C,$ you may have $B$ with an intermediate value between $A$ and $C$ and yet not be able to decide from the data whether to "group" $B$ with $A,$ $B$ with $C,$ or neither. That's why some of the rows in your table have items with double (indecisive) superscripts.