Solved – Fisher’s exact test in RNA-Seq

bioinformaticsbiostatisticsfishers-exact-test

In RNA-Seq analysis it is common to use tests analogous to Fisher's exact test to evaluate whether a gene is differentially expressed in two measured conditions.

Fisher's exact test relies on compiling a 2×2 (or greater) table of outcomes x conditions. When applied to RNA-Seq, I was wondering what the 2×2 table consists? I would assume that the two different genes are the two columns, but what then are the rows? The actual data and the average read count in each condition, to test the gene versus a null hypothesis of random sampling of read counts?

I would be happy for help clarifying this issue.

Best Answer

Fisher's exact test can be used in gene expression. The 2x2 table would look like this:

enter image description here

Reference: http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2881125/

The rows are the gene that you want to test and the all the remaining genes. The columns would be the control and treatment.

Now, imagine Treatment2 is useless, we would expect n11 very close to n12. Note that if we know n12, we also know n22 because the sample size for treatment2 is fixed. Thus, we can calculate a odd-ratio (check the paper for definitions for the symbols):

enter image description here

This ratio should be close to 1 if Treatment2 is no better than Treatment1. Our null hypothesis would be:

enter image description here

Our aim is to use Fisher Test to reject the null hypothesis.

However, this is usually not used in practice because the test assumes a single replicate which limits the statistical power. In particular, it's not possible to measure technical and biological variation (requires multiple replicates).