Solved – Test if differences between frequencies is significant

anovachi-squared-testmultiple-comparisonsstatistical significance

I have the following frequency table:

35  0   4   3   7   6   5   4
39  1   9   6   7   7   6   8
36  0   7   10  11  11  10  16
41  0   9   8   8   7   6   7
41  0   8   9   10  9   12  11
55  2   12  9   11  12  11  13
55  1   10  10  11  10  12  11
47  1   14  8   12  15  12  12
45  1   10  11  10  10  9   18
56  0   13  16  12  12  12  11

The Kruskal-Wallis ANOVA test returns:

Source      SS      df     MS     Chi-sq   Prob>Chi-sq  
Columns  25306.8    7   3615.26   47.16    5.18783e-008  
Error     17083.2   72   237.27
Total     42390     79

According to a multiple comparison of mean ranks:

  • Six groups of mean significantly different from group 1 (column 1)

  • Six groups of mean significantly different from group 2 (column 2)


Now the Kruskal-Wallis and multiple comparison tests make sense, however the Chi Square Test returns a chi square value of 31.377 and a p-value of 0.9997, which leads us to accept the null hypothesis that the frequencies are independent. I understand that an assumption of ANOVA is independence, but…

I want to see test if the frequencies are statistically independent, was the Kruskal-Wallis and multiple comparison tests the correct methodology? Note: I am not trying to be subjective, but for a given set of frequencies, how do you test that the differences between groups are significant?

Best Answer

Based on your additional explanation in the comments, it appears that you have 8 groups (each corresponding to a column) and a continuous outcome variable that you grouped into 10 bins (each bin corresponding to a row). Note that it also implies that the rows are ordered with later rows implying larger values.

First of all, if you do have the underlying continuous variable, then do not bin it - just use Kruskall-Wallis or ANOVA to compare the groups.

Assuming that the binning is unavoidable, you can still use a Kruskall-Wallis test, but not on the frequencies as you have apparently done it. Your current KW inference just tells you that you have more data in some groups as compared to others. The actual observations in this case are the row numbers (1 through 10), and the values in the table are just the frequencies of occurrences. Most statistical software has an option of specifying these as "weights" or "frequencies".

The chi-square test can be used on the frequencies, however if the rows are ordered it might have much lower power compared to the Kruskall-Wallis test to actually detect differences, since it completely ignores the ordering of the rows. Thus even though its results are valid, I would not recommend using these due to the loss of power.