R – Applying Fisher’s Exact Test on a 2×4 Table: Step-by-Step Guide

fishers-exact-testr

I am working on 4 different plants. I have the (RNA-Seq) data from sequencing. I look for two events E1 and E2, say, at certain positions in their genome. Let's say E2 is the common one and E1 is a special event. The positions I look for are identical across all 4 (in the genome). And let's say I see that the observation I have for 1 particular position is:

    P1   P2  P3  P4
E1   0   20   0   17
E2 100   80 100  120 

Here, P1 through P4 refers to plants and E1 and E2 refers to the events. E2 is more common. So, my objective is to actually check if E1 occurs more often in one or more plants than in the others. If they occur at the same proportion in all, of course it is not interesting to me.

I have 2 questions: (I have already asked question 1 before but didn't get an answer)

  1. Will a fisher test be right for this problem?

    I hypothesize (Null) that the proportion of E1 is not different in occurrence between the 4 plants. Now, I set out to find if the proportion I have here is by any means significantly different. I use R, fisher.test() and I get p-value=2.5e-10. So, I reject my Null hypothesis for this case because I find strong evidence against it.

  2. Sometimes, I have an observation like this,

          P1   P2  P3  P4
    E1   0   20   0  17
    E2   0   80   0 120 
    

    then fisher.test() gives me a p-value of 0.147. So, I don't reject the Null hypothesis. However, from a biological point of view, I would consider this significant. However fisher test answers the question I originally asked. I guess the proportion 0/0 for P1 and P3 are not useful (or not used).
    So, my question is: Is it possible to modify the test such that it is sensitive even if E1 and E2 are 0 in 1 or more plants for a particular observation?

Having thought a bit to frame this post, I guess, in that case I have to ask a different question.

Best Answer

You can use a Fisher exact test in your first example, though with so large a sample then a Chi-square test will give a similar result and without specialist software will be easier to calculate. Just looking at the numbers, it seems obvious you will reject your null hypothesis: E1 happens quite frequently in your observations of P2 and P4 but not at all with P1 and P3.

In your second example you have no information at all about P1 and P3. So all you are testing is whether there is a difference between P2 and P4. There is a difference in your observations, but it is obviously not as large as in your first example. The statistic is telling you the difference is not significant and so you should not reject your null hypothesis. And this is what you need to be told with this data.