My first post in this community. My knowledge of statistics is limited, and so I am seeking advice on the following problem. (Sorry for the long post, and thank you in advance for your help).
I have a group of 15 children with taste disorders (a not so common condition); 8 are girls, and 7 are boys.
I do the following experiment. Each child in my group tastes a slice of a specially prepared cake, and then I ask the child "does the cake taste good?", and the child can answer either "yes" or "no". Each child tastes the cake separately, and answer my question before meeting any other child in the group.
I collect the following data:
cake tastes good | cake does not taste good | ||
---|---|---|---|
girls | 7 | 1 | 8 |
boys | 6 | 1 | 7 |
13 | 2 | 15 |
-
Considering the population of girls with tastes disorders, I do a binomial test with number of success k = 7, number of trials n = 8, and probability of success p = 0.5, to test my null hypothesis H0 = "my cake tastes good for no more than 50% of the population of girls with taste disorders". In python I can run
binomtest(7, 8, 0.5, alternative="greater")
which gives the following resultBinomTestResult(k=7, n=8, alternative='greater', proportion_estimate=0.875, pvalue=0.03515625)
andConfidenceInterval(low=0.5293205913988617, high=1.0)
. I find that p-value <= 0.05, and therefore I can reject H0, and say that "my cake tastes good for more than 50% of the population of girls with taste disorders". -
Similarly, considering the population of boys with tastes disorders, I can do a binomial test to test my null hypothesis "my cake tastes good for no more than 50% of the population of boys with taste disorders". In python I can run
binomtest(6, 7, 0.5, alternative="greater")
which gives the following resultBinomTestResult(k=6, n=7, alternative='greater', proportion_estimate=0.8571428571428571, pvalue=0.0625)
. I find that p-value > 0.05, and therefore I cannot reject H0, and I say that "my cake tastes good for no more than 50% of the population of boys with taste disorders". -
Now I run a Fisher's exact test on my contingency table. My null hypothesis is H0 = "there is no significant difference between the proportion of girls with taste disorder who find that my cake tastes good, and the proportion of boys with taste disorders who find that my cake tastes good". In python I can run
fisher_exact([[7, 1], [6, 1]], alternative="two-sided")
which gives the following result(1.1666666666666667, 1.0)
, where the fist value (1.17) is the odds ratio, and the second value (1) is the p-value. I find that p-value >= 0.05, and therefore I cannot reject the null hypothesis, and I say that "there is no significant difference between the proportion of girls with taste disorder who find that my cake tastes good, and the proportion of boys with taste disorders who find that my cake tastes good".
The result obtained with the Fisher's exact test ("no significant difference between the proportion of girls and boys who finds that the cake tastes good") seems to contradict the results in (1) and (2), which say that the "more than 50% of the population of girls find that the cake tastes good" (1), and "no more than 50% of boys find that the cake tastes good" (2). How do interpret these results?
[problem re-phrased (hopefully in a better way) according to whuber suggestion]
Best Answer
I believe you are misinterpreting the results. Statistical tests in general don't give you a yes/no answer, but only likely/not so likely (given the data).
The non-significant result in your experiment (2) (boys tasting cake), $p = 0.0625$, does not mean:
Instead, you'd better interpret it as:
It still might (and likely does), but you lack the evidence. If you had more data, you could come to that conclusion, even if the ratio remained the same. For example, imagine having a twice as big sample, 14 boys, of which 12 find the cake tasty. The ratio, $12 / 14 = 6 / 7$, is the same, but the binomial test would give you $p \approx 0.0065$, i.e. significant.
The $H_0$ you work with in the binomial test is that $P($tasty$) = 0.5$. In Fisher's exact test, you have a different hypothesis. You assume that the ratio good/bad is $13/15$, regardless of the sex, and ask whether the observed ratio for the boys, $6/7$ significantly differs from that. It doesn't, but, again, if you had more data, it might:
results in $p \approx 0.0014$.