Solved – Kruskal Wallis for binary data

kruskal-wallis test”

I have collected data from three groups (with different sizes of n). This was done with a test instrument consisting of 9 items, each binary coded (1 = right, 0 = wrong). The results are not equally distributed.

I now want to show that there is a signifikant difference between the three groups for the total score of the test (min = 0, max = 9), but also for the single items.

I did a Kruskal-Wallis for the total score and the results were promising. Now I am unsure if I can use the same test for the individual items because they only have binary values (0 and 1).

So, is it useful to do a Kruskal-Wallis for binary data?
Is there any other test I could use?

Best Answer

How about something like a chi-squared test for your different groups? You'd have your marginals determined on one axis, but it should be a decent fit for the situation you're describing.

Related Solutions

Kruskal-Wallis Test – Data Considerations for Kruskal-Wallis Test

With small, and possibly unequal group sizes, I'd go with chl's and onestop's suggestion and do a Monte-Carlo permutation test. For the permutation test to be valid, you need exchangeability under $H_{0}$. If all distributions have the same shape (and are therefore identical under $H_{0}$), this is true.

Here's a first try at looking at the case of 3 groups and no ties. First, let's compare the asymptotic $\chi^{2}$ distribution function against a MC-permutation one for given group sizes (this implementation will break for larger group sizes).

P  <- 3                     # number of groups
Nj <- c(4, 8, 6)            # group sizes
N  <- sum(Nj)               # total number of subjects
IV <- factor(rep(1:P, Nj))  # grouping factor
alpha <- 0.05               # alpha-level

# there are N! permutations of ranks within the total sample, but we only want 5000
nPerms <- min(factorial(N), 5000)

# random sample of all N! permutations
# sample(1:factorial(N), nPerms) doesn't work for N! >= .Machine$integer.max
permIdx <- unique(round(runif(nPerms) * (factorial(N)-1)))
nPerms  <- length(permIdx)
H       <- numeric(nPerms)  # vector to later contain the test statistics

# function to calculate test statistic from a given rank permutation
getH <- function(ranks) {
    Rj <- tapply(ranks, IV, sum)
    (12 / (N*(N+1))) * sum((1/Nj) * (Rj-(Nj*(N+1) / 2))^2)
}

# all test statistics for the random sample of rank permutations (breaks for larger N)
# numperm() internally orders all N! permutations and returns the one with a desired index
library(sna)                # for numperm()
for(i in seq(along=permIdx)) { H[i] <- getH(numperm(N, permIdx[i]-1)) }

# cumulative relative frequencies of test statistic from random permutations
pKWH   <- cumsum(table(round(H, 4)) / nPerms)
qPerm  <- quantile(H, probs=1-alpha)  # critical value for level alpha from permutations
qAsymp <- qchisq(1-alpha, P-1)        # critical value for level alpha from chi^2

# illustration of cumRelFreq vs. chi^2 distribution function and resp. critical values
plot(names(pKWH), pKWH, main="Kruskal-Wallis: permutation vs. asymptotic",
     type="n", xlab="h", ylab="P(H <= h)", cex.lab=1.4)
points(names(pKWH), pKWH, pch=16, col="red")
curve(pchisq(x, P-1), lwd=2, n=200, add=TRUE)
abline(h=0.95, col="blue")                         # level alpha
abline(v=c(qPerm, qAsymp), col=c("red", "black"))  # critical values
legend(x="bottomright", legend=c("permutation", "asymptotic"),
       pch=c(16, NA), col=c("red", "black"), lty=c(NA, 1), lwd=c(NA, 2))

enter image description here

Now for an actual MC-permutation test. This compares the asymptotic $\chi^{2}$-derived p-value with the result from coin's oneway_test() and the cumulative relative frequency distribution from the MC-permutation sample above.

> DV1 <- round(rnorm(Nj[1], 100, 15), 2)  # data group 1
> DV2 <- round(rnorm(Nj[2], 110, 15), 2)  # data group 2
> DV3 <- round(rnorm(Nj[3], 120, 15), 2)  # data group 3
> DV  <- c(DV1, DV2, DV3)                 # all data
> kruskal.test(DV ~ IV)                   # asymptotic p-value
Kruskal-Wallis rank sum test
data:  DV by IV 
Kruskal-Wallis chi-squared = 7.6506, df = 2, p-value = 0.02181

> library(coin)                           # for oneway_test()
> oneway_test(DV ~ IV, distribution=approximate(B=9999))
Approximative K-Sample Permutation Test
data:  DV by IV (1, 2, 3) 
maxT = 2.5463, p-value = 0.0191

> Hobs <- getH(rank(DV))                  # observed test statistic

# proportion of test statistics at least as extreme as observed one (+1)
> (pPerm <- (sum(H >= Hobs) + 1) / (length(H) + 1))
[1] 0.0139972

Solved – Kruskal–Wallis non-parametric alternatives for groups with different shaped distributions

I don't think the statement in the quote is accurate.

The Kruskal-Wallis is effectively a test for at least one variable being stochastically larger than at least one other, which doesn't require identity of shape. Indeed, even if it was being used as a test of identical distribution under the null, it would only be necessary for the shape to be identical under the null; if the null is false, there's still no requirement for the shape to be the same then.

If, however, one was looking specifically at say a location shift alternative, in order to use it specifically as say a test of location difference (a test of medians, or of means, or of tenth percentiles or ... against a shift in the same) then the shapes would then be assumed the same under both null and alternative in order that rejection of the null implied that location shift.

Best Answer

Related Solutions

Kruskal-Wallis Test – Data Considerations for Kruskal-Wallis Test

Solved – Kruskal–Wallis non-parametric alternatives for groups with different shaped distributions

Related Question