Solved – How to achieve a two-sided combined p-value using Fisher’s method

combining-p-valueshypothesis testingmeta-analysismultiple-comparisonsp-value

Let's say I want to answer the question if smokers vs nonsmokers have different levels of Gene A. This seems like an obvious two-sided test. However, if I have multiple studies and I want to combine their p-values using Fisher's method, I now am confused how to accomplish this since Fisher's method is one-sided.

For example, let's say I am using a Wilcoxon Two-Sample rank sum test, and achieve the following results from 4 studies:

  1. Study 1: Smokers have higher Gene A, p = 0.02
  2. Study 2: Smokers have higher Gene A, p = 0.04
  3. Study 3: Smokers have lower Gene A, p = 0.02
  4. Study 2: Smokers have lower Gene A, p = 0.04

Because these are two-sided probabilities, I could not simply use Fisher's method on these p-values (would lose directionality), so instead I could calculate new p-values using a one-sided Wilcoxon test.

Testing the hypothesis that Gene A is greater in smokers, the data may instead look like this:

  1. Study 1: Testing if Smokers have higher Gene A, p = 0.01
  2. Study 2: Testing if Smokers have higher Gene A, p = 0.02
  3. Study 3: Testing if Smokers have higher Gene A, p = 0.99
  4. Study 2: Testing if Smokers have higher Gene A, p = 0.99

If I tried to use Fisher's method I get Χ2 = 17.07459, df =8, p = 0.029. This is not the result I would expect, as I would expect the p-values to "cancel" out to a large extent.

Regardless of that, this requires me to have a notion of the appropriate direction to construct my Wilcoxon test, when in in reality I want a "two-sided" approach–I do not know if it will be greater or lower.

Is there a way to generate a two-sided combined p-value (ie one that I could construct from a set of 4 one-sided p values in one direction, and 4 corresponding one-sided p values in the other direction)?

Best Answer

As you have found out Fisher's method does not cancel values in opposite directions. The same is true of Tippett's method (which uses the minimum $p$). However the good news is that Stouffer's method (which $z$-transforms the $p$) and Edgington's method (which sums the $p$) do cancel so you could use one of them instead.

If you go to the page for the metap R package here and look at the vignette you will find some worked examples. Disclaimer: I am the author of that package.

Edit to add comments on directionality

The null hypothesis $H_0$ is well defined, that all $p_i$ have a uniform distribution on the unit interval. There are two classes of alternative hypothesis

  • $H_A$: all $p_i$ have the same (unknown) non--uniform, non--increasing density,
  • $H_B$: at least one $p_i$ has an (unknown) non--uniform, non--increasing density.

So these are basically omnibus tests where there is no obvious directionality built in.

The lack of a natural alternative hypothesis may account for the number of methods available and their differing behaviour.

Given that there is no obvious directionality built in what people do in a substantive application is take a look at the data and if they see $p$-values piling up near 0 assume the effect is in that direction and if near 1 the contrary direction. I suppose there is nothing to stop people observing piling up at both ends and interpreting accordingly. Another course of action would be to use one of the methods like Fisher's which does not cancel and then perform it on the complement of the $p$-values where it would be sensitive to piling up at the other end. I am not aware of any settings in which these have been applied but that may just be me.