In a sense this is analogous to a situation where you test for differences in group means with ANOVA and then perform a post hoc test, such as Tukey's HSD, to tell which groups are the ones that actually differ. But, there is no equivalent post hoc test for Fisher's test.
The only "post hoc" thing that comes to mind is to run all pairwise comparisons for the table, and correct the p-values accordingly with, e.g., the Bonferroni method.
For a Chi square test, you could check the residuals or simply the expected-observed counts. In addition, going throught the percentages of observations in each group would probably answer your question at least partly, and this could be used with either Fisher's or Chi square test.
In R these can be done as follows:
# Percentages for rows and columns
# These a higher proportion of females than males in group D
prop.table(tab, 1) # rows
prop.table(tab, 2) # columns
# Chi square residuals
# The largest residuals are in the group D
chisq.test(tab)$residuals
# Chi square expected-observed
chisq.test(tab)$expected-chisq.test(tab)$observed
# Chi square "post hoc" test
# For Fisher you need to do this by hand
library(NCstats) # from rforge.net
chisqPostHoc(chisq.test(t(tab))) # for A-D
chisqPostHoc(chisq.test(tab)) # for gender
There are many ways to combine the information from the individual tests. Some examples follow. Where it makes sense to use ones near the top of the list, I'd lean toward those rather than the two at the end:
(a) If in the three situations, both test and control are believed to be independent draws from the same population of values (a 'test' population with constant proportion, and a control 'population' with its won constant proportion - just different sized samples being drawn in each case), then you can simply combine the data tables and test that. Point and interval estimates based on the combined data reflect the common population values.
(b) even when you don't assume a constant control proportion and a constant test proportion (as in (a) above), under the null the difference in proportions should still be zero. You can estimate the difference in proportion for each case and add the estimated proportions and add the variances of the estimates to construct a single statistic. If the difference in proportions were constant, you could get point and interval estimates for it, but the test still works as a test even when you don't assume a constant difference in proportion -- it will be sensitive to a tendency of the differences to be in the same direction. It would usually be reasonable to use a normal approximation for this test statistic, but you might also look at simulating distributions under the null.
(c) (again) in the case where the test and control proportions are not assumed to be constant across the three experiments under the alternative, you could still construct a statistic that combines information from the tables in other ways. One example would be to assume it's not the difference in proportions that's constant under the alternative, but that the log-odds is constant; you could then combine estimates of the log-odds (such as by forming weighted averages of them) and use that as an overall test statistic.
(d) You could combined (by addition) chi-square values for the individual tables; the chi-square approximation should be better in the combined case, though again it should be possible to construct simulated null distributions.
(e) if the tests are independent, you can use the Fisher procedure (see also here), which is effectively to multiply p-values, as one would tend to with independent probabilities (though by working on the log-scale, it's easier to compute the distribution).
If the nulls are true, the p-values have a uniform distribution. The $-2 \ln p_i$ should be exponentially distributed with mean $2$ (i.e. $\chi^2_2$) and adding those will give something that under the null should be $\chi^2_6$. If the combined result is unusually large for a $\chi^2_6$, you'd reject the null that the p-values were drawn from a uniform distribution in favor of the alternative that they tended to be smaller. In this particular case we have the slight problem that - even under the null - the p-values are discrete, so if the numbers are very small you might want to consider simulation under the null here as well.
(f) you could even add p-values. If the common nulls are true, the p-values (again) should be uniform; the sum of the p-values should have the distribution of a sum of uniforms; again this sum can be tested (in this case you test whether the sum of the p-values is too small to have come from a sum-of-uniforms), though again the discreteness may be an issue in some cases.
Where it's reasonable (from your prior knowledge of the situation) to make some assumptions (such as constant proportions, constant differences of proportions, constant log odds, or whatever) you should probably do so; this is usually more meaningful than say falling back on case (e), even though it's still a perfectly valid thing to do.
Best Answer
You can use a Fisher exact test in your first example, though with so large a sample then a Chi-square test will give a similar result and without specialist software will be easier to calculate. Just looking at the numbers, it seems obvious you will reject your null hypothesis:
E1
happens quite frequently in your observations ofP2
andP4
but not at all withP1
andP3
.In your second example you have no information at all about
P1
andP3
. So all you are testing is whether there is a difference betweenP2
andP4
. There is a difference in your observations, but it is obviously not as large as in your first example. The statistic is telling you the difference is not significant and so you should not reject your null hypothesis. And this is what you need to be told with this data.