Solved – Fisher’s Exact Test and Hypergeometric Distribution

fishers-exact-testhypergeometric-distribution

I wanted to understand fisher exact test better, so I devised up the following toy example, where f and m corresponds to male and female, and n and y corresponds to "soda consumption" like this:

> soda_gender

    f m
  n 0 5
  y 5 0

Obviously, this is a drastic simplification, but I didn't want the context to get in the way. Here I just assumed that males don't drink soda and females drink soda, and wanted to see if the statistical procedures come to the same conclusion.

When I run the fisher exact test in R, I get the following results:

> fisher.test(soda_gender)
Fisher's Exact Test for Count Data

data:  soda_gender
p-value = 0.007937
alternative hypothesis: true odds ratio is not equal to 1
95 percent confidence interval:
 0.0000000 0.4353226
sample estimates:
odds ratio 
         0 

Here, since p-value is 0.007937, we would conclude that gender and soda consumption are associated.

I know that fisher-exact test is related to hypergeomteric distribution. So I wanted to get the similar results using that. In other words, you can view this problems as following : there are 10 balls, where 5 are labeled as "male", and 5 are labeled as
"female", and you draw 5 balls randomly without replacement, and you see 0 male balls. What is the chance of this observation? To answer this question, I used the following command:

> phyper(q=0,m=5,n=5,k=5,lower.tail=TRUE)
[1] 0.003968254

My questions are:
1) How come the two results are different?
2) Is there anything incorrect or not rigorous in my reasoning above?

Best Answer

Fisher's exact test works by conditioning upon the table margins (in this case, 5 males and females and 5 soda drinkers and non-drinkers). Under the assumptions of the null hypothesis, the cell probabilities for observing a male soda drinker, male non-soda drinker, female soda drinker, or female non-soda drinker are all equally likely (0.25) because of the margin totals.

The particular table you used for the FET has no table aside from its converse, 5 female non-soda drinkers and 5 male soda drinkers, which is "at least as unlikely" under the null hypothesis. So you'll notice that doubling the probability you obtained in your hypergeometric density gives you the FET p-value.