Solved – Has anyone used the Marascuilo procedure for comparing multiple proportions

chi-squared-testmultiple-comparisons

The Marascuilo procedure as described here seems to be a test that addresses the issue of multiple comparisons for proportions when you want to test which specific proportions are different from each other after rejecting the null in an overall chi-square test.

However, I am not very familiar with this test. So, my questions:

What nuances (if any) should I worry about when using this test?
I know of at least two other approaches (see below) to address the same issue. Which test is the 'better' approach?:
- Performing "partitioned chi square" refereed in this answer by @Brett Magill
- Using a Holm–Bonferroni method to adjust p-values.

Best Answer

Just a partial answer because I've never heard of this method. From what I read in the link you provided, it seems to be a single-step procedure (much like Bonferroni, except we rework the test statistics instead of the p-value) which is likely to be too conservative.

In R, there is a function pairwise.prop.test() which allows any correction for multiple comparisons (single-step or step-down FWER methods or FDR-based), but it is quit what you already suggested (although Bonferroni is by far too conservative, but still very used in practice). A resampling approach, using permutation, might be interesting too. The coin R package provides a well-established testing framework in this respect, see §5 of Implementing a Class of Permutation Tests: The coin Package, but I never had to deal with permutation tests on categorical data in a post-hoc way.

About the analysis of subdivided contingency tables, I generally consider specific associations as a guide to develop additional hypotheses (as for any unplanned comparisons), but this is another question. I generally just use visualization tools, like mosaicplot from Michael Friendly, Pearson's residuals, and if I seek to explain specific patterns of association I use log-linear models.

Related Solutions

Solved – Post Hoc Pair-Wise Comparisons for the Chi-Square Test of Homogeneity of Proportions

Having a significant chi-square does not necessarily mean that ONE subject is different from the others. However, IF this is the case, one way to approach this is to fit a logistic regression model and follow it up with contrasts that compare each prediction with the average of the others.

Here's an example. First, a fake dataset:

> fake = data.frame(
+   subj = factor(1:5),
+   pos = c(34, 36, 40, 62, 35),
+   neg = c(66, 64, 60, 38, 65))

Fit a logistic regression model and get the deviance

> fake.glm = glm(cbind(pos, neg) ~ subj, family = binomial(), data = fake)

> anova(fake.glm)
Analysis of Deviance Table
Model: binomial, link: logit
Response: cbind(pos, neg)
Terms added sequentially (first to last)

     Df Deviance Resid. Df Resid. Dev
NULL                     4     22.486
subj  4   22.486         0      0.000

The deviance statistic is a chi-square test, but not the same as the Pearson chi-square often used. A chi-square of 22.486 with 4 d.f. is significant.

The lsmeans package provides one way to obtain post hoc contrasts. Other possibilities include multcomp and effects.

> library(lsmeans)
> ( fake.lsm = lsmeans(fake.glm, "subj") )
 subj     lsmean        SE df   asymp.LCL    asymp.UCL
 1    -0.6632942 0.2111002 NA -1.07704294 -0.249545496
 2    -0.5753641 0.2083333 NA -0.98368998 -0.167038315
 3    -0.4054651 0.2041241 NA -0.80554108 -0.005389135
 4     0.4895482 0.2060214 NA  0.08575368  0.893342771
 5    -0.6190392 0.2096570 NA -1.02995931 -0.208119103

Confidence level used: 0.95

The above table summarizes the predicted values of $\log\{p/(1-p)\}$, and ther SEs and confidence intervals. You may also obtain a visual display of these results:

> plot(fake.lsm)

enter image description here

The following obtains estimates and associated $t$ statistics comparing each of these with the average of the others:

> contrast(fake.lsm, "del.eff")
 contrast    estimate        SE df    z.ratio p.value
 1 effect -0.38571416 0.2351174 NA -1.6405175  0.2523
 2 effect -0.27580157 0.2327922 NA -1.1847544  0.2952
 3 effect -0.06342777 0.2292697 NA -0.2766513  0.7821
 4 effect  1.05533890 0.2308552 NA  4.5714324  <.0001
 5 effect -0.33039540 0.2339036 NA -1.4125281  0.2630

P value adjustment: fdr method for 5 tests

We find that subject 4's prediction is significantly greater than the average of the others'. The FDR (false discovery rate) is the default adjustment for multiple testing when del.eff contrasts are specified. It seems appropriate for this kind of application.

If you prefer, you may instead do this analysis in terms of the predicted values of $p$, instead of the logits.

> ( fake.lsmp = regrid(fake.lsm, transform = TRUE) )
 subj prob         SE df asymp.LCL asymp.UCL
 1    0.34 0.04737088 NA 0.2471548 0.4328452
 2    0.36 0.04800000 NA 0.2659217 0.4540783
 3    0.40 0.04898979 NA 0.3039818 0.4960182
 4    0.62 0.04853864 NA 0.5248660 0.7151340
 5    0.35 0.04769696 NA 0.2565157 0.4434843

Confidence level used: 0.95

... and use similar commands on this object to obtain contrasts or a plot.

These methods are probably not exactly the same as the reference you link, but they get at the same thing.

Post-Hoc Test for Chi-Square Goodness-of-Fit Test – Which is Correct

To my surprise a couple of searches didn't seem to turn up prior discussion of post hoc for goodness of fit; I expect there's probably one here somewhere, but since I can't locate it easily, I think it's reasonable to turn my comments into an answer, so that people can at least find this one using the same search terms I just used.

The pairwise comparisons you seek to do (conditional on only comparing the two groups involved) are sensible.

This amounts to taking group pairs and testing whether the proportion in one of the groups differs from 1/2 (a one-sample proportions test). This - as you suggest - can be done as a z-test (though binomial test and chi-square goodness of fit would also work).

Many of the usual approaches to dealing with the overall type I error rate should work here (including Bonferroni -- along with the usual issues that can come with it).

Best Answer

Related Solutions

Solved – Post Hoc Pair-Wise Comparisons for the Chi-Square Test of Homogeneity of Proportions

Post-Hoc Test for Chi-Square Goodness-of-Fit Test – Which is Correct

Related Question