Solved – Problems with Fisher’s Method for combining p-values

combining-p-valuesr

I am using Fisher's Method to combine p-values, and have noticed some strange behavior for large p-values and large $n.$

In my case I have a large number of not statistically significant results(e.g. .1 to .5), and I am using Fisher's Method to combine them.
However, I noticed that Fisher's Method seems to display unstable behavior for these large p-values. Thus, changing the p-values from .367 to .368 resulted in drastic changes for the combined p-value. Why is this?

p_value=fisherIntegration(rep(.367,10000000)
#p_value=1.965095e-14
p_value=fisherIntegration(rep(.368,10000000)
#pvalue=0.8499356

In contrast, for low p-values and small $n,$ this behaved very nicely. For example:

p_value=fisherIntegration(rep(.05,10))
#pvalue=7.341634e-06

Here is the function I use for Fisher integration:

fisherIntegration  <- function (vector){
    my_length=length(vector)
    deg_free=my_length*2
    y=-2*sum(log(vector))
    p.val <- 1-pchisq(y, df = deg_free);
    p.val=as.numeric(p.val);
    return(p.val)

}

EDIT
This post is somewhat related but does not address why .367 is a magic number in this context: Why does Fisher's method yield $p\gg 0.5$ when combining several p-values all equal to $0.5$?

Best Answer

As explained at https://stats.stackexchange.com/a/314739/919, Fisher's Method combines p-values $p_1, p_2, \ldots, p_n$ under the assumption they arise independently under null hypotheses with continuous test statistics. This means each is independently distributed uniformly between $0$ and $1.$ A simple calculation establishes that $-2\log(p_i)$ has a $\chi^2(2)$ distribution, whence

$$P = \sum_{i=1}^n -2\log(p_i)$$

has a $\chi^2(2n)$ distribution. For large $n$ (as guaranteed by the Central Limit Theorem) this distribution is approximately Normal. It has a mean of $2n$ and variance of $4n,$ as we may readily calculate.

Suppose, now, that $P$ is "much" different than this mean. "Much" means, as is usual, in comparison to the standard deviation. In other words, suppose that $P$ differs from $2n$ by more than a few multiples of $\sqrt{4n}=2\sqrt{n}.$ From basic information about Normal distributions this implies that $P$ is either unusually small or unusually large. Consequently, as $P$ ranges from $2n-2K\sqrt{n}$ to $2n+2K\sqrt{n}$ for $K \approx 3,$ Fisher's method assigns a cumulative probability (that is, combined p-value) ranging from nearly $0$ to nearly $1.$

In other words, all of the "interesting" probability for $P$ occurs within the interval $(2n-2K\sqrt{n}, 2n+2K\sqrt{n})$ for small $K$. As $n$ grows, this interval narrows relative to its center (at $2n$).

One conclusion we could draw from this result is that when $\sqrt{n}$ is large enough to dominate $2K$--that is, when $n$ is much larger than $(2\times3)^2\approx 40$ or so, then Fisher's Method may be reaching the limits of its usefulness.


In the circumstances of the question, $n=10^7.$ The interesting interval for the average log p-value, $-P/(2n),$ therefore is roughly

$$-(2n-2K\sqrt{n}, 2n+2K\sqrt{n})/(2n) \approx (-0.999051, -1.00095)$$

when $K=3.$

The corresponding geometric mean p-values are

$$e^{-0.999051} = 0.368229\text { and } e^{-1.00095} = 0.367531.$$

The lower value of $0.367$ used in the question is outside this interval, giving essentially zero (lower) tail probability, while the upper value of $0.368$ lies within this interval, giving a probability that is still appreciably less than $1.$ This is an extreme example of our previous conclusion, which could be restated like this:

When the average natural logarithm of the p-values differs much from $-1,$ Fisher's Method will produce a combined p-value extremely near $0$ or near $1$. "Much" is proportional to $1/\sqrt{2n}.$

Related Question