Violating Cochran-Mantel-Haenszel assumption and alternative

cochran-mantel-haenszelodds-ratior

I have student grade data (A, B, C, D, F) before and after a new course was introduced. I'd like to investigate if there are differences in the fractions of students scoring each of these grades. In R, I've set up tables like so:

> summary_data

   grade  exp got count
1      F  pre yes    96
2      F  pre  no   219
3      F post yes    19
4      F post  no    93
5      D  pre yes    75
6      D  pre  no   240
7      D post yes    27
8      D post  no    85
9      C  pre yes    64
10     C  pre  no   251
11     C post yes     6
12     C post  no   106
13     B  pre yes    31
14     B  pre  no   284
15     B post yes     8
16     B post  no   104
17     A  pre yes    49
18     A  pre  no   266
19     A post yes    52
20     A post  no    60

Where exp describes before or after introduction of the new course and got describes whether or not that grade was received. For example, in the years before this new course was introduced, 96 students received an F and 219 did not. In the years after, 19 received an F and 93 did not, and so on. To sum up, I've got five 2×2 contingency tables each for a different grade.

I supposed I could just run multiple chi-square tests on this, but I'm afraid of increasing the likelihood of a type 1 error. I've instead run a Cochran-Mantel-Haenszel test (which is significant) followed up by multiple Fisher exact tests that suggest the differences lie among students that got Fs, Cs, and As.

> summary_table = xtabs(count ~ exp + got + grade, data= summary_data)
> ftable(summary_table)
> mantelhaen.test(summary_table)

Mantel-Haenszel X-squared = 3.1293e-30, df = 1, p-value = 1 ......

> library(rcompanion)
> groupwiseCMH(summary_table, group = 3, fisher  = TRUE, method  = "fdr", correct = "none")

  Group   Test  p.value    adj.p
1     F Fisher 6.17e-03 1.03e-02
2     D Fisher 1.00e+00 1.00e+00
3     C Fisher 9.60e-05 2.40e-04
4     B Fisher 4.51e-01 5.64e-01
5     A Fisher 3.19e-10 1.60e-09

After some more investigation, I ran a Woolf test, which reveals I'm violating homogeneity of odds ratios across each of these contingency tables, which may render the CMH test inappropriate.

How strict should I be when interpreting a significant Woolf test when performing a CMH test and are there alternatives, if violating this assumption is a red line?

And finally, should I just scrap this strategy altogether and try something else?

Best Answer

The following is a link to an archived page from R-help that will be helpful for you: https://stat.ethz.ch/pipermail/r-help/2007-February/126254.html

First, your instincts are right about not wanting to put too much emphasis on the results of the Woolf test to assess the assumptions of the CMH test. It's like when you are assessing a general linear model: you don't really want to use a hypothesis test to see if the residuals are reasonably normal or homoscedastic.

The R-help link has code to look at odds ratio of the tables in the individual strata. (I guess the code works for individual tables that are 2 x 2.) I would look at those results. How large of a difference among odds ratio is too large? I don't know. But I think this a better way to assess the variation among the odds ratios than to use a hypothesis test.

The R-help link also presents a different approach to conduct the analysis: logistic regression. This would probably be my preferred approach anyway. At least when using R, there are options to include interactions among your factors and there are options for post-hoc analysis, and so on.

Related Question