Solved – Comparing proportions between multiple groups – Fisher’s exact test

fishers-exact-testhypothesis testingproportion;rstatistical significance

I have a simple dataset on reproductive success of a certain plant species. Reproductive success was defined as a proportion between number of flowers and number of fruits. We measured on 10 different sites, several seasons. I would like to test if there is a significant difference in RS between sites. An example of my dataset:

I used Fisher's exact test – the same approach as in this example here: Fisher's exact test in R – 2×4 table – as follows:

data <- matrix(c(6, 148, 0, 3, 0, 1, 0, 
         4, 2, 8, 0, 17, 8, 151, 11, 108, 1, 
         33, 0, 2), nrow = 10, byrow = T)
row.names(data) <- c("1", "2", "3", "4", 
         "5", "6", "7", "8", "9", "10")
colnames(data) <- c("fruit YES", "fruit NO")
data
   fruit YES fruit NO
1          6      148
2          0        3
3          0        1
4          0        4
5          2        8
6          0       17
7          8      151
8         11      108
9          1       33
10         0        2
fisher.test(data)

    Fisher's Exact Test for Count Data

data:  data
p-value = 0.3329
alternative hypothesis: two.sided

The result shows that there is no significant difference between sites, but if you check site no. 5 in the data, the percentage of fruit is much higher than the rest. Did I use the right test? If I did – did I do it right?
Would you suggest any other method?
Additional question: I would also like to check if the number of flowers and pH affect the production of fruits on each site. Which test/method should I use in this case – logistic regression? I'm very new to R, so a more detailed explanation would be very very appreciated.

Best Answer

The result shows that there is no significant difference between sites, but if you check site no. 5 in the data, the percentage of fruit is much higher than the rest.

True, however, you have only 2 "yes" and 8 "no", that is why the difference is not significant.

Did I use the right test? If I did - did I do it right? Would you suggest any other method?

The Fisher exact test is appropriate for your data and I have no suggestion of alternatives. Since I'm not an expert in R I can't tell if it was correctly applied.

Additional question: I would also like to check if the number of flowers and pH affect the production of fruits on each site. Which test/method should I use in this case - logistic regression? [...]

Yes, the logistic regression is appropriate, considering that the fruit variable is dichotomous (yes/no). As independent variables you should consider the site as nominal, using the first or the seventh category as reference, and the pH as continuous variable.

Related Solutions

Solved – Fisher’s exact test

The massive 58 amid much lower frequencies signals that any test is just quantifying a major failure of independence. I did this in Stata. The command ret li (short for return list) obliges Stata to show results as exactly as it knows them, but both tests yield P-values that are 0.000 to 3 d.p. It is right to be a little cautious about low expected values (for row 1 here in particular) but the test results are overwhelming.

. tabi 0  2 \ 5 58 \ 4 3 \ 4 3 

            |          col
        row |         1          2 |     Total
 -----------+----------------------+----------
          1 |         0          2 |         2 
          2 |         5         58 |        63 
          3 |         4          3 |         7 
          4 |         4          3 |         7 
 -----------+----------------------+----------
      Total |        13         66 |        79 

      Pearson chi2(3) =  20.5779   Pr = 0.000

. ret li 

scalars:
              r(p) =  .0001288081813192
           r(chi2) =  20.57794057794058
              r(c) =  2
              r(r) =  4
              r(N) =  79

. tabi 0  2 \ 5 58 \ 4 3 \ 4 3 , exact

Enumerating sample-space combinations:
stage 4:  enumerations = 1
stage 3:  enumerations = 3
stage 2:  enumerations = 17
stage 1:  enumerations = 0

             |          col
         row |         1          2 |     Total
  -----------+----------------------+----------
           1 |         0          2 |         2 
           2 |         5         58 |        63 
           3 |         4          3 |         7 
           4 |         4          3 |         7 
  -----------+----------------------+----------
       Total |        13         66 |        79 

       Fisher's exact =                 0.000

. ret li 

scalars:
        r(p_exact) =  .0003124258226793
              r(c) =  2
              r(r) =  4
              r(N) =  79

Solved – Comparing p-values for Fisher’s exact test and test of equal proportions

prop.test uses a Pearson chi-square test. This is an asymptotic test. It will be worst when you have small samples or get too near the tails. Fishers will always be "better" because it is an "exact" test that does not rely upon asymptotic arguments to obtain its p-values...rather, it computes all the ways the table could have come about and then finds the proportion that were as-or-more-extreme.

Practically, this will result in Fisher's being less "powerful" when it matters because Pearson's approximation is most wrong in exactly those cases.

I do not know why fisher.test should take so long. For sample sizes on the order of $10^7$, it should have dropped to approximate methods unless the events are really rare. Are they? An alternative might be binom.test which uses Fisher's and may swap algorithms when sample sizes get large and event rates are still common. That might speed things up. A MonteCarlo version might work, also.

In your case and for sample sizes this high and non-rare events, Fisher's and Pearson's should not disagree to any real extent but I'd request the continuity-correction on Pearson prop.test(..., correct=TRUE). Try your simulation with this option and see if there is a dime's worth of difference then.

Another option is Barnard's unconditional test which can be more powerful but which many people frown at (even Barnard) though their cited reasons are often esoteric. In any case, that is not likely to be faster than either Pearson or Fisher.

Best Answer

Related Solutions

Solved – Fisher’s exact test

Solved – Comparing p-values for Fisher’s exact test and test of equal proportions

Related Question