Solved – Distribution of p-values from multiple experiments

combining-p-valuesp-value

My experimental setup is the following:

compute an optimal solution $S$ to problem $A$ on instance $I$,
evaluate test statistic $f(S)$,
compute $f(S_i)$ for $S_i$ $(0 \leq i \leq n)$ drawn from a uniform distribution of feasible solutions to $A$, and
compute a p-value for S.

The null hypothesis is that $f$ is not correlated with the optimality of $A$. The alternative is that $f$ is.
The p-value for $S$ is then $l/n$ where $l$ is the number of $S_i$s such that $f(S_i) > f(S)$.

If we repeat this test for many instances $I_j$, we get a distribution of p-values.
I would like to determine if we can reject the null hypothesis over all tests.

I've looked at Fisher's method, but I have some pvalues that are so small, that the result is 0.

If I make a histogram of the pvalues I get the following:
enter image description here

There is a clear bias towards very small p-values, but what worries me is the uniform looking distribution of the other p-values.
Does our data reject the null hypothesis?

EDIT:
What if the left-most bar indicated less tests with such small pvalues. How would we evaluate the following distribution of p-values?

enter image description here

Best Answer

If the null hypothesis is true then the p-values of a test should follow a continuous standard uniform distribution.

Think of it this way: Say we a priori decided a significance level of .05 (5%). That would mean that if we were to repeat our experiment many times and our null hypothesis is true we want to (incorrectly) reject our true null hypothesis in 5% of the replications. So we want to find 5% of the p-values with a value less than .05. Similarly, if we a priori chose our significance level to be .10 we would want to find a p-value less than .10 in 10% of the replications. In principle, there is nothing special about the 5% or 10% significance levels other than convention. So we would want this relationship to hold for any significance level $\alpha$. That way you reject a true null hypothesis $100 \alpha\%$ for all possible values of $\alpha$. This means that we want the following to be true if the null hypothesis is true:

$\mathrm{Pr}(p < \alpha) = \alpha$

This is just the CDF of the continuous standard uniform distribution. A clear deviation from the uniform distribution is an indication that your null hypothesis is to be rejected.

Your graph shows clearly that the distribution of p-values is not uniformly distributed, which would lead me to reject the null hypothesis. The fact that there are some replications where the p-value is very large fits within the logic of statistical testing: a small p-value does not give you absolute certainty that the null hypothesis is false, it just tells you that it is unlikely, but not impossible, that the data you have seen would occure if the null hypothesis were true.

As an asside, you can use this feature to check whether a test performs as it should. For example, many tests are based on an asymptotic argument and you may wonder if your data has enough observations for that argument to kick in. In that case you can simulate data such that you know that the null hypothesis is true but is otherwise similar to your data (same number of observations, independent variables with the same joint distribution, etc), and than look at the distribution of p-values. If you see a deviation from the uniform distribution, then that is an indication that the test is not quite working for your data. Here is an example: https://stats.stackexchange.com/questions/59091#59091

Related Solutions

Solved – Fisher’s method of combining p-values when one of the p-values is zero

Irrespective of the discussion in the comments about how these $p$-values of $0$ arose there are methods for combining $p$-values which can be calculated if $p=0$.

As the OP indicated neither Fisher's method nor Stouffer's works.

The method of Edgington based on the sum of $p$, the closely related mean $p$ method, the method using logit of $p$, Tippett's method based on the minimum $p$ and variants of Wilkinson's method of which Tippett is a special case can all be calculated. Whether that is a sensible thing to do depends on the scientific question of course.

All the methods mentioned are available in the R package metap which, disclaimer, I wrote and maintain.

Solved – Bonferroni bound and FDR: compute p-values

Your formula for the p-values appears correct, assuming p1 is the t-value.

I don't see any reason why you would use both the Benjamini-Hochberg correction to control false discovery rate (FDR) and the Bonferroni correction to control the familywise error rate (FWER). You would choose one approach or the other.

Corrections for multiple p-values can be handled in R with the p.adjust function.

When using this function, the decision rule remains p < alpha [not p < alpha / n]. That is, R adjusts the p-values for you so that you don't need to adjust the decision rule.

The following code in R calculates the p-value for 7 genes, then uses either BH or Bonferroni correction. The S columns in the data frame indicate whether the p-value is < 0.05.

You'll note that Bonferroni is more conservative than BH. I think that Bonferroni is too conservative for most situations. It is helpful to read up on the various FDR and FWER control methods.

Gene = 1:7
t.values = c(-0.66, 1.02, 3.2, 2.7, 1.1, 2.5, 0.33)
p.values = 2 * pt(abs(t.values), df = 100, lower.tail = FALSE)

p.BH = p.adjust(p.values, method="BH")
p.B = p.adjust(p.values, method="bonferroni")

### Make things pretty ###

p.values = round(p.values, 3)
p.BH = round(p.BH, 3)
p.B = round(p.B, 3)

S.BH = p.BH < 0.05
S.B = p.B < 0.05

Data = data.frame(Gene, t.values, p.values, p.BH, S.BH, p.B, S.B)

Data

###  Gene t.values p.values  p.BH  S.BH   p.B   S.B
###     1    -0.66    0.511 0.596 FALSE 1.000 FALSE
###     2     1.02    0.310 0.434 FALSE 1.000 FALSE
###     3     3.20    0.002 0.013  TRUE 0.013  TRUE
###     4     2.70    0.008 0.029  TRUE 0.057 FALSE
###     5     1.10    0.274 0.434 FALSE 1.000 FALSE
###     6     2.50    0.014 0.033  TRUE 0.098 FALSE
###     7     0.33    0.742 0.742 FALSE 1.000 FALSE

Best Answer

Related Solutions

Solved – Fisher’s method of combining p-values when one of the p-values is zero

Solved – Bonferroni bound and FDR: compute p-values

Related Question