Hypothesis Testing – Fisher’s Exact Test vs Binomial Test Explained

binomial distributionconfidence intervalfishers-exact-testp-valuepython

My first post in this community. My knowledge of statistics is limited, and so I am seeking advice on the following problem. (Sorry for the long post, and thank you in advance for your help).

I have a group of 15 children with taste disorders (a not so common condition); 8 are girls, and 7 are boys.

I do the following experiment. Each child in my group tastes a slice of a specially prepared cake, and then I ask the child "does the cake taste good?", and the child can answer either "yes" or "no". Each child tastes the cake separately, and answer my question before meeting any other child in the group.

I collect the following data:

cake tastes good cake does not taste good
girls 7 1 8
boys 6 1 7
13 2 15
  1. Considering the population of girls with tastes disorders, I do a binomial test with number of success k = 7, number of trials n = 8, and probability of success p = 0.5, to test my null hypothesis H0 = "my cake tastes good for no more than 50% of the population of girls with taste disorders". In python I can run binomtest(7, 8, 0.5, alternative="greater") which gives the following result BinomTestResult(k=7, n=8, alternative='greater', proportion_estimate=0.875, pvalue=0.03515625) and ConfidenceInterval(low=0.5293205913988617, high=1.0). I find that p-value <= 0.05, and therefore I can reject H0, and say that "my cake tastes good for more than 50% of the population of girls with taste disorders".

  2. Similarly, considering the population of boys with tastes disorders, I can do a binomial test to test my null hypothesis "my cake tastes good for no more than 50% of the population of boys with taste disorders". In python I can run binomtest(6, 7, 0.5, alternative="greater") which gives the following result BinomTestResult(k=6, n=7, alternative='greater', proportion_estimate=0.8571428571428571, pvalue=0.0625). I find that p-value > 0.05, and therefore I cannot reject H0, and I say that "my cake tastes good for no more than 50% of the population of boys with taste disorders".

  3. Now I run a Fisher's exact test on my contingency table. My null hypothesis is H0 = "there is no significant difference between the proportion of girls with taste disorder who find that my cake tastes good, and the proportion of boys with taste disorders who find that my cake tastes good". In python I can run fisher_exact([[7, 1], [6, 1]], alternative="two-sided") which gives the following result (1.1666666666666667, 1.0), where the fist value (1.17) is the odds ratio, and the second value (1) is the p-value. I find that p-value >= 0.05, and therefore I cannot reject the null hypothesis, and I say that "there is no significant difference between the proportion of girls with taste disorder who find that my cake tastes good, and the proportion of boys with taste disorders who find that my cake tastes good".

The result obtained with the Fisher's exact test ("no significant difference between the proportion of girls and boys who finds that the cake tastes good") seems to contradict the results in (1) and (2), which say that the "more than 50% of the population of girls find that the cake tastes good" (1), and "no more than 50% of boys find that the cake tastes good" (2). How do interpret these results?

[problem re-phrased (hopefully in a better way) according to whuber suggestion]

Best Answer

I believe you are misinterpreting the results. Statistical tests in general don't give you a yes/no answer, but only likely/not so likely (given the data).

The non-significant result in your experiment (2) (boys tasting cake), $p = 0.0625$, does not mean:

We know for sure that no more than 50% of the boys in the population would find that the cake tastes good.

Instead, you'd better interpret it as:

Based on the available data, we cannot conclude, with the desired certainty, that the cake would taste good to more than 50% of the boys in the population.

It still might (and likely does), but you lack the evidence. If you had more data, you could come to that conclusion, even if the ratio remained the same. For example, imagine having a twice as big sample, 14 boys, of which 12 find the cake tasty. The ratio, $12 / 14 = 6 / 7$, is the same, but the binomial test would give you $p \approx 0.0065$, i.e. significant.

The $H_0$ you work with in the binomial test is that $P($tasty$) = 0.5$. In Fisher's exact test, you have a different hypothesis. You assume that the ratio good/bad is $13/15$, regardless of the sex, and ask whether the observed ratio for the boys, $6/7$ significantly differs from that. It doesn't, but, again, if you had more data, it might:

fisher_exact([[6000, 1000], [7000, 1000]])

results in $p \approx 0.0014$.

Related Question