Having a significant chi-square does not necessarily mean that ONE subject is different from the others. However, IF this is the case, one way to approach this is to fit a logistic regression model and follow it up with contrasts that compare each prediction with the average of the others.
Here's an example. First, a fake dataset:
> fake = data.frame(
+ subj = factor(1:5),
+ pos = c(34, 36, 40, 62, 35),
+ neg = c(66, 64, 60, 38, 65))
Fit a logistic regression model and get the deviance
> fake.glm = glm(cbind(pos, neg) ~ subj, family = binomial(), data = fake)
> anova(fake.glm)
Analysis of Deviance Table
Model: binomial, link: logit
Response: cbind(pos, neg)
Terms added sequentially (first to last)
Df Deviance Resid. Df Resid. Dev
NULL 4 22.486
subj 4 22.486 0 0.000
The deviance statistic is a chi-square test, but not the same as the Pearson chi-square often used. A chi-square of 22.486 with 4 d.f. is significant.
The lsmeans package provides one way to obtain post hoc contrasts. Other possibilities include multcomp and effects.
> library(lsmeans)
> ( fake.lsm = lsmeans(fake.glm, "subj") )
subj lsmean SE df asymp.LCL asymp.UCL
1 -0.6632942 0.2111002 NA -1.07704294 -0.249545496
2 -0.5753641 0.2083333 NA -0.98368998 -0.167038315
3 -0.4054651 0.2041241 NA -0.80554108 -0.005389135
4 0.4895482 0.2060214 NA 0.08575368 0.893342771
5 -0.6190392 0.2096570 NA -1.02995931 -0.208119103
Confidence level used: 0.95
The above table summarizes the predicted values of $\log\{p/(1-p)\}$, and ther SEs and confidence intervals. You may also obtain a visual display of these results:
> plot(fake.lsm)
The following obtains estimates and associated $t$ statistics comparing each of these with the average of the others:
> contrast(fake.lsm, "del.eff")
contrast estimate SE df z.ratio p.value
1 effect -0.38571416 0.2351174 NA -1.6405175 0.2523
2 effect -0.27580157 0.2327922 NA -1.1847544 0.2952
3 effect -0.06342777 0.2292697 NA -0.2766513 0.7821
4 effect 1.05533890 0.2308552 NA 4.5714324 <.0001
5 effect -0.33039540 0.2339036 NA -1.4125281 0.2630
P value adjustment: fdr method for 5 tests
We find that subject 4's prediction is significantly greater than the average of the others'. The FDR (false discovery rate) is the default adjustment for multiple testing when del.eff
contrasts are specified. It seems appropriate for this kind of application.
If you prefer, you may instead do this analysis in terms of the predicted values of $p$, instead of the logits.
> ( fake.lsmp = regrid(fake.lsm, transform = TRUE) )
subj prob SE df asymp.LCL asymp.UCL
1 0.34 0.04737088 NA 0.2471548 0.4328452
2 0.36 0.04800000 NA 0.2659217 0.4540783
3 0.40 0.04898979 NA 0.3039818 0.4960182
4 0.62 0.04853864 NA 0.5248660 0.7151340
5 0.35 0.04769696 NA 0.2565157 0.4434843
Confidence level used: 0.95
... and use similar commands on this object to obtain contrasts or a plot.
These methods are probably not exactly the same as the reference you link, but they get at the same thing.
To my surprise a couple of searches didn't seem to turn up prior discussion of post hoc for goodness of fit; I expect there's probably one here somewhere, but since I can't locate it easily, I think it's reasonable to turn my comments into an answer, so that people can at least find this one using the same search terms I just used.
The pairwise comparisons you seek to do (conditional on only comparing the two groups involved) are sensible.
This amounts to taking group pairs and testing whether the proportion in one of the groups differs from 1/2 (a one-sample proportions test). This - as you suggest - can be done as a z-test (though binomial test and chi-square goodness of fit would also work).
Many of the usual approaches to dealing with the overall type I error rate should work here (including Bonferroni -- along with the usual issues that can come with it).
Best Answer
Just a partial answer because I've never heard of this method. From what I read in the link you provided, it seems to be a single-step procedure (much like Bonferroni, except we rework the test statistics instead of the p-value) which is likely to be too conservative.
In R, there is a function
pairwise.prop.test()
which allows any correction for multiple comparisons (single-step or step-down FWER methods or FDR-based), but it is quit what you already suggested (although Bonferroni is by far too conservative, but still very used in practice). A resampling approach, using permutation, might be interesting too. Thecoin
R package provides a well-established testing framework in this respect, see §5 of Implementing a Class of Permutation Tests: The coin Package, but I never had to deal with permutation tests on categorical data in a post-hoc way.About the analysis of subdivided contingency tables, I generally consider specific associations as a guide to develop additional hypotheses (as for any unplanned comparisons), but this is another question. I generally just use visualization tools, like mosaicplot from Michael Friendly, Pearson's residuals, and if I seek to explain specific patterns of association I use log-linear models.