My experimental setup is the following:
- compute an optimal solution $S$ to problem $A$ on instance $I$,
- evaluate test statistic $f(S)$,
- compute $f(S_i)$ for $S_i$ $(0 \leq i \leq n)$ drawn from a uniform distribution of feasible solutions to $A$, and
- compute a p-value for S.
The null hypothesis is that $f$ is not correlated with the optimality of $A$. The alternative is that $f$ is.
The p-value for $S$ is then $l/n$ where $l$ is the number of $S_i$s such that $f(S_i) > f(S)$.
If we repeat this test for many instances $I_j$, we get a distribution of p-values.
I would like to determine if we can reject the null hypothesis over all tests.
I've looked at Fisher's method, but I have some pvalues that are so small, that the result is 0.
If I make a histogram of the pvalues I get the following:
There is a clear bias towards very small p-values, but what worries me is the uniform looking distribution of the other p-values.
Does our data reject the null hypothesis?
What if the left-most bar indicated less tests with such small pvalues. How would we evaluate the following distribution of p-values?