Solved – Fisher’s tea tasting, binomial exact test

fishers-exact-testhypothesis testing

Please see the famous Fisher's experiment on biologist B. Muriel Bristol-Roach's ability to discern taste in red tea here (see Lady Tasting Tea).

In this experiment Fisher gave Bristol-Roach 8 cups of tea, of which 4 are made by first adding tea to the cup, and the other 4 made by first adding milk to the cup. Bristol-Roach remarkably correctly selected all 4 cups prepared by the same method. Then Fisher quantified the probability of her doing so by chance, and concluded that it was too small for her to do so just by chance.

I am wondering if a different method can be used, using the exact binomial test here, with $H_0$: the success rate = 0.5

Would this be enough to conclude that Bristol-Roach does have the ability to distinguish the teas, if the binomial exact test successfully reject $H_0$?

Best Answer

This is a good idea, but the lady knows that there are 4 cup of tea for each type. This is a valuable information for the lady, which makes things wrong if we model the process via a binomial distribution. The problem is that the variables (successes at each trial) you want to consider are not independent and identically distributed.

I think you have thought to model the process by at least one of these cases:

Case 1: You study the number of success among the 4 selected cups.
Under this representation the statistic is 4 success over 4 trials. Under the null, each one would have a probability of 0.5 to be milk-first. This is mathematically right, but these probabilities are not independents.
Illustration: If the cup A,B and C are mistakes, there are good chances that the last one is a good one because among the 5 remaining cups, there are 4 milk-first cups remaining and only one milk-after cup.

Case 2: You study the number of success among the 8 presented cups.
Under this representation the statistic is 8 successes over 8 trials. This is the same problem of non independence.
Illustration: If she judged well the first 7 cups, the probability that she also judge well the last cup is 1. Because, relatively to the experimental setting, by elimination, there is no possibility that the lady is right about 7 cups and wrong about one.

In more mathematical term, for both cases, $\newcommand{\success}{\rm success}P(\success_i)$ is not independent with $P(\success_j)$.

Fisher avoided this problem by considering the selection process as a whole, enumerating the number of successful selections (well, only one) divided by the number of possible selections (4 amongst 8 = 70). Still, there is a simple raw formula which takes into account non independence, less beautiful than Fisher solution though:

\begin{align} P(\success) &= P(X_1=1)\times P(X_2=1|X_1=1)\times \\ &\quad\ \ P(X_3=1|X_1=1 \cap X_2=1)\times \\ &\quad\ \ P(X_4=1|X_1=1 \cap X_2=1 \cap X_3=1) \\ &= 4/8\times 3/7\times 2/6\times 1/5 \\ &= 1/70 \end{align}

A binomial test would be the correct answer to another kind of setting like this one I just made up.

The judge toss a fair coin, if tails he prepares a milk-first tea, if head a milk-after tea. Obviously the lady does not know the result of the coin toss.

The lady knows the process and will have to judge which kind of cup of tea was served.

With this setting, a binomial test, as you described it, with $H_0$: success rate = 0.5, would be undeniably a good approach.

Related Solutions

Solved – Fisher’s exact test

The massive 58 amid much lower frequencies signals that any test is just quantifying a major failure of independence. I did this in Stata. The command ret li (short for return list) obliges Stata to show results as exactly as it knows them, but both tests yield P-values that are 0.000 to 3 d.p. It is right to be a little cautious about low expected values (for row 1 here in particular) but the test results are overwhelming.

. tabi 0  2 \ 5 58 \ 4 3 \ 4 3 

            |          col
        row |         1          2 |     Total
 -----------+----------------------+----------
          1 |         0          2 |         2 
          2 |         5         58 |        63 
          3 |         4          3 |         7 
          4 |         4          3 |         7 
 -----------+----------------------+----------
      Total |        13         66 |        79 

      Pearson chi2(3) =  20.5779   Pr = 0.000

. ret li 

scalars:
              r(p) =  .0001288081813192
           r(chi2) =  20.57794057794058
              r(c) =  2
              r(r) =  4
              r(N) =  79

. tabi 0  2 \ 5 58 \ 4 3 \ 4 3 , exact

Enumerating sample-space combinations:
stage 4:  enumerations = 1
stage 3:  enumerations = 3
stage 2:  enumerations = 17
stage 1:  enumerations = 0

             |          col
         row |         1          2 |     Total
  -----------+----------------------+----------
           1 |         0          2 |         2 
           2 |         5         58 |        63 
           3 |         4          3 |         7 
           4 |         4          3 |         7 
  -----------+----------------------+----------
       Total |        13         66 |        79 

       Fisher's exact =                 0.000

. ret li 

scalars:
        r(p_exact) =  .0003124258226793
              r(c) =  2
              r(r) =  4
              r(N) =  79

Hypothesis Testing – Fisher’s Exact Test: Alternative Tests for Unknown Milk-First Cups

Some would argue that even if the second margin is not fixed by design, it carries little information about the lady's ability to discriminate (i.e. it's approximately ancillary) & should be conditioned on. The exact unconditional test (first proposed by Barnard) is more complicated because you have to calculate the maximal p-value over all possible values of a nuisance parameter, viz the common Bernoulli probability under the null hypothesis. More recently, maximizing the p-value over a confidence interval for the nuisance parameter has been proposed: see Berger (1996), "More Powerful Tests from Confidence Interval p Values", The American Statistician, 50, 4; exact tests having the correct size can be constructed using this idea.

Fisher's Exact Test also arises as a randomization test, in Edgington's sense: a random assignment of the experimental treatments allows the distribution of the test statistic over permutations of these assignments to be used to test the null hypothesis. In this approach the lady's determinations are considered as fixed (& the marginal totals of milk-first and tea-first cups are of course preserved by permutation).

Best Answer

Related Solutions

Solved – Fisher’s exact test

Hypothesis Testing – Fisher’s Exact Test: Alternative Tests for Unknown Milk-First Cups

Related Question