Solved – Conditional or unconditional exact test in R

chi-squared-testfishers-exact-testr

I have a 2×2 contingency table and i want to calculate if the pair inside is significantly different.
i made a matrix like the following named raw_matrix

          CNS random
Not_H3K4  343  28825
H3K4      11   2014

Create this matrix , thus:

raw_matrix = structure(c(343, 11, 28825, 2014), 
    .Dim = c(2L, 2L), .Dimnames = list(
    c("NotH3K", "H3K"), c("CNS", "Random")))

as i searched, unconditional exact test like Barnard’s and Boschloo’s exact tests are the most powerful test for this end. i installed the 'Exact' package and tried to do the test using this command:

exact.test(raw_matrix)

it took more than half an hour on a 64GB ram and 3.5 GH CPU computer and finally it gave the following error:

    Error: cannot allocate vector of size 42.0 Gb
In addition: Warning messages:
1: In matrix(A[xTbls + 1, ] * B[yTbls + 1, ], ncol = length(int)) :
  Reached total allocation of 61417Mb: see help(memory.size)
2: In matrix(A[xTbls + 1, ] * B[yTbls + 1, ], ncol = length(int)) :
  Reached total allocation of 61417Mb: see help(memory.size)
3: In matrix(A[xTbls + 1, ] * B[yTbls + 1, ], ncol = length(int)) :
  Reached total allocation of 61417Mb: see help(memory.size)
4: In matrix(A[xTbls + 1, ] * B[yTbls + 1, ], ncol = length(int)) :
  Reached total allocation of 61417Mb: see help(memory.size)

then i installed 'Exact2x2' package and did the test using this command:

exact2x2(raw_matrix)

which gave me the following results:

    Two-sided Fisher's Exact Test (usual method using minimum likelihood)

data:  raw_matrix
p-value = 0.006433
alternative hypothesis: true odds ratio is not equal to 1
95 percent confidence interval:
 1.2028 4.2424
sample estimates:
odds ratio 
  2.178631 

but as i read in the 'Exact'package tutorial , the fisher exact test which is a conditional exact test is not so powerful. finally i did the normal chi square test using the command chisq.test(raw.matrix) which gave the following results that is different from fisher test's results:

    Pearson's Chi-squared test with Yates' continuity correction

data:  test_1
X-squared = 6.2045, df = 1, p-value = 0.01274

im a Geneticist and not an expert in statistics, i appreciate if anybody could tell me what is the best strategy here to do this test

Best Answer

What is the nature of you're underlying data? It could be the case that the approximation provided by the Chi-Squared test is reasonable. The basic idea is that if you have enough data, and it is reasonably evenly distributed across the cells in your table, the Chi-Squared approximation is reasonable (as long as other assumptions are met such as random sampling). The general rule of thumb given is that each cell should have at least 80% of the cells have a count of 5 or greater, and no cells have a count of 0. This is a heuristic, so if you have very unbalanced data or something like that you might want to do a bit more research, but if appropriate conditions are satisfied you may just want to proceed with a Chi-Squared test.

If these criterion are not met, Fisher showed that p-values for 2x2 tables can be obtained exactly as the probability of the cell counts can be shown to be a Hypergeometric distribution. This can be generalized to larger tables, and I believe at least some of the R packages estimate this p-value using a Monte Carlo method. An additional issue to consider is that these p-values may actually be conservative. The "mid-p" value can be used to correct for this, but I am not certain about the theoretical underpinnings of this approach.

Finally, I am not familiar with the exact2x2 package, but if you believe the p-value produced by this package is reasonable, it doesn't appear that you have issues with power. Saying a test is not powerful means that we are concerned that we will not correctly reject the null hypothesis when it is false. Given that the test the exact2x2 package conducted resulted in rejection of the null hypothesis for common significance levels I would think that the lack of power is less of a concern.

Related Question