How should you deal with a cell value in a contingency table that is equal to zero in statistical calculations? (Note that such a value can be structural, i.e., it must be zero by definition, or random, i.e., it could have been some other value, but zero was observed.)
Solved – How should you handle cell values equal to zero in a contingency table
contingency tables
Related Solutions
Pearson's $\chi^2$ test is useful for a sample of $n$ observations cross-classified by two variables, say $A$ and $B$. These tests test the null hypothesis that $A$ and $B$ are independent variables. So, for an example, if you crossed two strains of D. melanogaster (fruit flies) with different mutations and observed the $F_2$ generation frequencies in $n$ progeny, the $\chi^2$ test tests for linkage of the two traits (i.e., are they on different chromosones [null] or the same chromosomes [i.e., linked, the alternative]).
McNemar's test is used for paired data -- that is, each observation represents a pair of values. For an example, consider a set of $n$ lung cancer patients each with a spouse. You record the smoking habits of the patients and their spouse, and cross classify. Pearson's test would appear to have $2\,n$ observations, but in this case you only have $n$. McNemar's test makes this correction. The hypotheses tested are similar: "Is cancer status related to smoking status?"
I suppose that one could think of this as a "between subjects" vs "within subjects" difference, and there is no doubt that things are similar. I don't see them that way, but I'll confess to not having thought about it much.
In regards to your Question 2,the restriction is on expected cell counts, not observed cell counts. Observed counts are reality, while expected cell counts represent a model. You can think of the restrictions as helping to ensure a decent approximation under the null hypothesis. Reality can (and should) diverge from the model when necessary, but if the model is approximately correct, it would be bad to have a situation where discrepancies get inflated in small cells.
Finally, an exact test is precisely what it says it is. The distribution of the test statistic under the null hypothesis is known exactly. Pearson's $\chi^2$, McNemar's test, and the log-likelihood $\chi^2$ are all based on asymptotic approximations to the distribution of the test statistic under the null hypothesis. Fisher's test, by comparison, notes that conditionally on the marginal totals, the distributions in the two cells of any row (or column) of the table follow a hypergeometric distribution. This insight permits computation of an exact observed significance level ($p$-value) for any given number of observations in the $1, 1$ cell.
Fisher's exact test tests the same null as Pearson's $\chi^2$ and can be used whenever Pearson's is appropriate and in other situations where Pearson's approximation is believed to be unreliable.. Pearson's test also makes use of the information in the marginal totals, and so is also conditional on those totals. Knowing the a priori margins (or even one margin) is unnecessary.
It's not the observed values that R is generating the objection to, but the expected values; it's possible to have a mix of high and low observed without triggering that warning.
Note that one possibility is to simulate the distribution of the chi-square statistic (i.e. fix the margins and randomly generate tables from the set of tables with the same margin).
(R will do that automatically with the argument simulate.p.value=TRUE
, though you'll very likely also want to increase the value of B
- the number of simulations - from the default value as well, since the lowest p-value estimate possible is 1/B)
In addition, it appears you have the possibility of some columns being all-zero. Your best bet would be to drop the offending column from the calculation when that happens.
Best Answer
A very nice discussion of structural zeros in contingency tables is provided by West, L. and Hankin, R. (2008), “Exact Tests for Two-Way Contingency Tables with Structural Zeros,” Journal of Statistical Software, 28(11), 1–19. URL http://www.jstatsoft.org/v28/i11
As the title implies, they implement Fisher’s exact test for two-way contingency tables in the case where some of the table entries are constrained to be zero.