Hypothesis Testing – Multiple Comparisons on Nested Subsets of Data

chi-squared-testhypothesis testingmultiple-comparisons

Suppose I am doing some experimental procedure on two treatment groups. The procedure has several stages, each of which may fail. Failure at any stage halts the experiment. If all stages are passed then there is some useful result.

Although I'm primarily interested in the final result, the treatments might also entail different failure rates along the way. I'd like to quantify this, and since we're looking at simple counts it seems like a chi square or Fisher exact test would be appropriate.

If I want to use such a test as it were recursively, to the groups passing each stage, do I need to apply some correction for multiple comparisons?

That is, supposing the groups progressed like this:

             Group_A         Group_B
Start        100             100
Stage_1      90              95
Stage_2      80              85
Stage_3      60              75
Stage_4      55              30
Results      ...             ...

Does it make sense to do a sequence of 2×2 tests of the form:

             Group_A         Group_B
Passed_N     X               Y
Failed_N     Started_N-X     Started_N-Y

I feel like I should just know the answer, but I can't figure out whether this counts as doing repeated tests or not. The populations are somewhat distinct each time, but heavily overlapping.

Also, would it make a difference if I had physical reasons to suppose that only stage 4 should be at all affected by the treatments? Could I just choose to ignore any differences in passage through the other stages in that case?

(Feel free also to post answers like "ZOMG, don't use that sort of test here, use XXXX, in manner YYYY, for reasons ZZZZ.")

Best Answer

Assuming I understood your question correctly, I think what you are doing is fine. However, it does sound like you should correct your P values for multiple comparisons (for example using holm method with p.adjust function in R). The reason for the adjustment is that you are searching for "interesting results" over the entire sequence of stages.

Also, instead of using Fisher's test you might want to use Bernard's exact test (which is more powerful then fisher).

Another way I can think of doing this is to have for each subject the maximum stage it reached, and then do a wilcox.test on the ranks of the stages patients reached (between groups A and B).

Hope it helps...