I have student grade data (A, B, C, D, F) before and after a new course was introduced. I'd like to investigate if there are differences in the fractions of students scoring each of these grades. In R, I've set up tables like so:
> summary_data
grade exp got count
1 F pre yes 96
2 F pre no 219
3 F post yes 19
4 F post no 93
5 D pre yes 75
6 D pre no 240
7 D post yes 27
8 D post no 85
9 C pre yes 64
10 C pre no 251
11 C post yes 6
12 C post no 106
13 B pre yes 31
14 B pre no 284
15 B post yes 8
16 B post no 104
17 A pre yes 49
18 A pre no 266
19 A post yes 52
20 A post no 60
Where exp describes before or after introduction of the new course and got describes whether or not that grade was received. For example, in the years before this new course was introduced, 96 students received an F and 219 did not. In the years after, 19 received an F and 93 did not, and so on. To sum up, I've got five 2×2 contingency tables each for a different grade.
I supposed I could just run multiple chi-square tests on this, but I'm afraid of increasing the likelihood of a type 1 error. I've instead run a Cochran-Mantel-Haenszel test (which is significant) followed up by multiple Fisher exact tests that suggest the differences lie among students that got Fs, Cs, and As.
> summary_table = xtabs(count ~ exp + got + grade, data= summary_data)
> ftable(summary_table)
> mantelhaen.test(summary_table)
Mantel-Haenszel X-squared = 3.1293e-30, df = 1, p-value = 1 ......
> library(rcompanion)
> groupwiseCMH(summary_table, group = 3, fisher = TRUE, method = "fdr", correct = "none")
Group Test p.value adj.p
1 F Fisher 6.17e-03 1.03e-02
2 D Fisher 1.00e+00 1.00e+00
3 C Fisher 9.60e-05 2.40e-04
4 B Fisher 4.51e-01 5.64e-01
5 A Fisher 3.19e-10 1.60e-09
After some more investigation, I ran a Woolf test, which reveals I'm violating homogeneity of odds ratios across each of these contingency tables, which may render the CMH test inappropriate.
How strict should I be when interpreting a significant Woolf test when performing a CMH test and are there alternatives, if violating this assumption is a red line?
And finally, should I just scrap this strategy altogether and try something else?
Best Answer
The following is a link to an archived page from R-help that will be helpful for you: https://stat.ethz.ch/pipermail/r-help/2007-February/126254.html
First, your instincts are right about not wanting to put too much emphasis on the results of the Woolf test to assess the assumptions of the CMH test. It's like when you are assessing a general linear model: you don't really want to use a hypothesis test to see if the residuals are reasonably normal or homoscedastic.
The R-help link has code to look at odds ratio of the tables in the individual strata. (I guess the code works for individual tables that are 2 x 2.) I would look at those results. How large of a difference among odds ratio is too large? I don't know. But I think this a better way to assess the variation among the odds ratios than to use a hypothesis test.
The R-help link also presents a different approach to conduct the analysis: logistic regression. This would probably be my preferred approach anyway. At least when using R, there are options to include interactions among your factors and there are options for post-hoc analysis, and so on.