I have a simple question to ask, I think it was not covered in other questions.
I am running a Kruskal-Wallis test in R, using the function kruskal.test.
The output gives me the p-value and a chi-squared value (see example below)
Kruskal-Wallis rank sum test
data: mat[, 2] by mat[, 3]
Kruskal-Wallis chi-squared = 0.052043, df = 1, p-value = 0.8195
I know that I have to look at the p-value to know if there is a significant difference among the groups I'm testing, but I am interested in the chi-squared. I noticed that the chi-square increases when the p-value decreases, but its statistical meaning is not yet clear to me.
There is not an explanation of its meaning in ?kruskal.test.
Can someone help me understand this?
Best Answer
The Kruskal Wallis chi-square statistic
The "Kruskal Wallis chi-squared" value reported by the R function is equal to the statistic $H$ that is computed in the test. If there are no ties then
$$H = \frac {N-1}{N}\sum_{i=1}^C \frac {\left(\bar {R_i}-\bar {R}\right)^2}{(N^2-1)/12}$$
where $\bar{R_i}$ is the mean of the ranks in the $i$-th sample and $\bar{R}=\frac{1}{2}(N+1)$ is the mean of all ranks.
It is named like this because the statistic follows approximately a chi squared distribution. Under the hood you can see it as the means $\bar {R_i}-\bar{R}$ being approximated as normal distributions with variance $\frac{1}{12}(N^2-1)$.
See:
Kruskal, William H., and W. Allen Wallis. "Use of ranks in one-criterion variance analysis." Journal of the American statistical Association 47.260 (1952): 583-621. https://doi.org/10.1080/01621459.1952.10483441
The p-value
For the Kruskal Wallis test the p-value is $P(H_{\text{if $H_0$ true}} \geq H_{\text{observed}})$, a way to indicate how extreme a particular measurement $H_{\text{observed}}$ is by stating the probabilty that the value for an experiment when the null hypothesis is true, $H_{\text{if $H_0$ true}}$, would be equal or higher.
If the null hypothesis is false then you will be more likely to get such high/extreme values, thus when you observe an unlikely (ie low p-value) extreme value $H$ this indicates that the null/no-effect hypothesis may be false or at least is not supported by the data.
Exploring R functions
Whenever you have problems with R it can be helpful to look into the source code. This is fairly easy for most functions you just type the function name into the console and the source code is printed. Some functions are hidden and then you can use this
The source-code:
note the
STATISTIC
value at the end.