Solved – Testing approach Chi Square vs Mann Whitney

chi-squared-teststatistical significancewilcoxon-mann-whitney-test

I am looking to understand the best approach to determine if students in one school performed better on a quiz than another. In this scenario the quiz has five questions and only whole points are awarded, giving 0, 1, 2, 3, 4, & 5 as the only possible results.

The results:

| score | school_a | school_b |
|-------|----------|----------|
| 0     | 150      | 175      |
| 1     | 50       | 40       |
| 2     | 30       | 30       |
| 3     | 20       | 15       |
| 4     | 5        | 10       |
| 5     | 80       | 90       |

Given the non-normality of the distribution it seems like the Mann-Whitney test would be appropriate, but because scores are integers I am concerned that the ties may cause issues. Should I instead treat the scores as categorical variables and perform a chi-square test instead?

Is there another approach I should be considering instead?

Best Answer

Mann-Whitney (also two-sample Wilcoxon test): Data are

a = rep(0:5, c(150, 50, 30, 20, 5, 80))
b = rep(0:5, c(175, 40, 30, 15, 10, 90))
summary(a)
    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
   0.000   0.000   1.000   1.761   4.000   5.000 
 summary(b)
    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
   0.000   0.000   1.000   1.764   4.250   5.000 

The means and medians look very nearly the same for the two schools. The formal test has P-value $0.6723 > 0.05$ so the difference is not significant at the 5% level. For samples as large as these, the implementation of the Wilcoxon rank sum test in R, reports no difficulty handling ties.

 wilcox.test(a,b)

        Wilcoxon rank sum test with continuity correction

data:  a and b
W = 61350, p-value = 0.6723
alternative hypothesis: 
   true location shift is not equal to 0

Chisquared test of homogeneity of probabilities: Counts are

f.a = c(150, 50, 30, 20, 5, 80)
f.b = c(175, 40, 30, 15, 10, 90)
MAT = rbind(f.a, f.b);  MAT
    [,1] [,2] [,3] [,4] [,5] [,6]
f.a  150   50   30   20    5   80
f.b  175   40   30   15   10   90

I agree with @Glen_b that the Wilcoxon test is best here, because the issue seems to be which school had the higher scores overall. However, the chi-squared test will test whether the probabilities of getting individual scores 0 through 5 are substantially the same at the two schools. No difference between the two distributions is found.

chisq.test(MAT)

        Pearson's Chi-squared test

data:  MAT
X-squared = 5.1107, df = 5, p-value = 0.4025

Kolmogorov-Smirnov test of of differences in CDFs:

This is a test to see if there is a substantial difference between the empirical CDFs (ECDFs) of the two samples. To make an ECDF, sort the data and plot a stairstep function that increases by $1/n,$ where $n$ is the sample size, at each data value. If there are $k$ tied observations at a point, then jump up by $k/n.$ Here are the ECDFs of for the two schools, blue for a and orange for b.

plot(ecdf(a), col="blue", lwd=2)
lines(ecdf(b), col="orange", lty="dotted", pch="o")

enter image description here

The test statistic $D$ of the K-S test is the maximum vertical distance between the two ECDFs. Again here, no significant difference is found. For your data, this test may be considered an alternative to the chi-squared test.

ks.test(a,b)

        Two-sample Kolmogorov-Smirnov test

data:  a and b
D = 0.03835, p-value = 0.9605
alternative hypothesis: two-sided

Warning message:
In ks.test(a, b) : 
   p-value will be approximate in the presence of ties