Solved – Two-sided vs One-sided p-value: is there a special case with Fisher’s Exact test

contingency tablesdistributionsfishers-exact-testp-value

I read [this] and I usually agree with the view that two-sided p-values are double of one-sided "greater" ones ("lesser" ones being 1-"greater").
But someone told me that "probability distribution of the tables with fixed marginal sums used in Fisher's exact test is not symmetric."
Is it right?

Best Answer

In general it's not symmetric, that's correct. I assume the 2x2 case is specifically intended here (beyond that the notion of tails is problematic).

The distribution of the count in (say) the top left cell is discrete and in general, asymmetric. Specifically, it has a hypergeometric distribution. For some particular two-tailed significance level, different fractions will be "in" each tail.

In some cases it will be symmetric, though. For example the distribution of the number of correct guesses in the experiment of the lady tasting tea is symmetric.

(In relation to the exact test, Fisher wasn't one for alternative hypotheses so such considerations as which tail had what wouldn't have been part of what he'd normally concern himself over, except in so far as it might help him arrive at the calculation of the p-value.)

Here's an example that's asymmetric, and where all the values in the critical region for a 5% test lie in one tail. This is for a table with row totals 15 and 60 and column totals 12 and 63, and where we consider the count in the top-left cell:

If we're in Neyman-Pearson mode (and being conservative with our significance level), you would work out your critical region by adding up from the smallest probabilities of the null distribution to the largest until you got as close as possible to $\alpha$ without exceeding it. We see that the left tail here would never be able to come in, because the lowest probability in the left tail is more than our $\alpha$ of $0.05$ (indeed, even if it was a fair bit smaller than 0.05 it wouldn't because it would still take the cumulative sum over 0.05). Clearly then, this would be a one-tailed case.

If we're in Fisher mode, we take our particular table and add in all tables with equal or smaller probability. If our table is in the right tail and its probability is below 0.05 all the smaller ones would be as well.

So that would also be a p-value where all our two-tailed contributions were from the one-tail.

Related Solutions

Solved – Fisher’s exact test

The massive 58 amid much lower frequencies signals that any test is just quantifying a major failure of independence. I did this in Stata. The command ret li (short for return list) obliges Stata to show results as exactly as it knows them, but both tests yield P-values that are 0.000 to 3 d.p. It is right to be a little cautious about low expected values (for row 1 here in particular) but the test results are overwhelming.

. tabi 0  2 \ 5 58 \ 4 3 \ 4 3 

            |          col
        row |         1          2 |     Total
 -----------+----------------------+----------
          1 |         0          2 |         2 
          2 |         5         58 |        63 
          3 |         4          3 |         7 
          4 |         4          3 |         7 
 -----------+----------------------+----------
      Total |        13         66 |        79 

      Pearson chi2(3) =  20.5779   Pr = 0.000

. ret li 

scalars:
              r(p) =  .0001288081813192
           r(chi2) =  20.57794057794058
              r(c) =  2
              r(r) =  4
              r(N) =  79

. tabi 0  2 \ 5 58 \ 4 3 \ 4 3 , exact

Enumerating sample-space combinations:
stage 4:  enumerations = 1
stage 3:  enumerations = 3
stage 2:  enumerations = 17
stage 1:  enumerations = 0

             |          col
         row |         1          2 |     Total
  -----------+----------------------+----------
           1 |         0          2 |         2 
           2 |         5         58 |        63 
           3 |         4          3 |         7 
           4 |         4          3 |         7 
  -----------+----------------------+----------
       Total |        13         66 |        79 

       Fisher's exact =                 0.000

. ret li 

scalars:
        r(p_exact) =  .0003124258226793
              r(c) =  2
              r(r) =  4
              r(N) =  79

Solved – Computing the power of Fisher’s exact test in R

What you are asking for here is a post-hoc power analysis. (More specifically, "the probability of correctly rejecting the null hypothesis" is the power, and 1-power is beta, "the probability of a type-II error". You ask for both, but we only need one to know the other.) We take your existing dataset as the alternative hypothesis / model of the true data generating process. I don't know of a specialized, pre-existing function (e.g., in the pwr package) to do this, but, yes, this can be done in R. You will just have to simulate it. For (considerably) more information on power analyses, and simulating them in R, you should read my answer here: Simulation of logistic regression power analysis - designed experiments. In this case, I will just give a quick, adapted version for dealing with Fisher's exact test. (I usually write code as close to pseudocode as possible so that it may be more widely understood, but because this has the potential of taking so long to run, I try to move as much as possible out of the for loop, and use some of R's unique capacities.)


table = matrix(c(18,20,15,15,10,55,65,70,30), 3, 3)
table
#      [,1] [,2] [,3]
# [1,]   18   15   65
# [2,]   20   10   70
# [3,]   15   55   30
N = sum(table)  # this is the total number of observations
N
# [1] 298
probs = prop.table(table)       
      # these are the probabilities of an observation
probs                           #  being in any given cell
#            [,1]       [,2]      [,3]
# [1,] 0.06040268 0.05033557 0.2181208
# [2,] 0.06711409 0.03355705 0.2348993
# [3,] 0.05033557 0.18456376 0.1006711
probs.v = as.vector(probs)      
      # notice that the probabilities read column-wise
probs.v
# [1] 0.06040268 0.06711409 0.05033557 0.05033557 0.03355705 
    0.18456376 0.21812081
# [8] 0.23489933 0.10067114
cuts = c(0, cumsum(probs.v))    
        # notice that I add a 0 on the front
cuts
# [1] 0.00000000 0.06040268 0.12751678 0.17785235 0.22818792 
      0.26174497
# [7] 0.44630872 0.66442953 0.89932886 1.00000000

set.seed(4941)      # this makes it exactly reproducible
B      = 10000      # number of iterations in simulation
vals   = runif(N*B) # generate random values / probabilities
cats   = cut(vals, breaks=cuts, labels=c("11", "21", "31", 
             "12", "22", "32", "13", "23", "33"))
cats   = matrix(cats, nrow=N, ncol=B, byrow=F)
counts = apply(cats, 2, function(x){ as.vector(table(x)) })

rm(table, N, vals, probs, probs.v, cuts, cats) 
p.vals = vector(length=B) # this will store the outputs
ptm = proc.time()         # this lets me time the simulation
for(i in 1:B){
  mat       = matrix(counts[,i], nrow=3, ncol=3, byrow=T)
  p.vals[i] = fisher.test(mat, simulate.p.value=T)$p.value
}
proc.time() - ptm               # not too bad, really
#  user  system elapsed 
# 28.66    0.32   29.08 
#
mean(p.vals>=.05)               
     # the estimated probability of type II errors is 0
# [1] 0
c(0, 3/B)                       
     # using the rule of 3 to estimate the 95% CI
# [1] 0e+00 3e-04

Given how far your data diverge from the null hypothesis in Fisher's exact test, and the amount of data you have, this simulation does not turn up a single type II error in 10,000 iterations. Because each iteration can be understood as a draw from a binomial distribution with probability $p$ (which we are estimating as the proportion of type II errors observed), this simulation is actually an estimate with some stochastic variability. We can form a 95% confidence interval bounding the true probability of a type II error. To get around the fact that we didn't actually find any type II errors, we will use the rule of 3 ($3/N$) to estimate the upper limit of the CI. Thus, the 95% CI for true type II error rate is $[0,\ 0.0003]$.

On a different note, @rvl points out in the comments that "[p]ost hoc power is a silly exercise". That is largely true. I have seen people make the argument, in effect, 'my results are not significant, but I don't have any power, so there's no reason to believe my theory isn't right', which is fairly bizarre on any number of levels. On the other hand, since your results are significant, it isn't clear what difference knowing the post-hoc power for your study is either. I find that understanding post-hoc power can be useful pedagogically to help people begin to understand the topic. And we can also take this as a starting point for a-priori power analyses for planning future studies.

Best Answer

Related Solutions

Solved – Fisher’s exact test

Solved – Computing the power of Fisher’s exact test in R

Related Question