Solved – Non-uniform distribution of p-values when simulating binomial tests under the null hypothesis

binomial distributionMATLABp-valuesimulationuniform distribution

I heard that under the null hypothesis the p-value distribution should be uniform. However, simulations of binomial test in MATLAB return very different-from-uniform distributions with mean larger than 0.5 (0.518 in this case):
enter image description here

coin = [0 1];
success_vec = nan(20000,1);

for i = 1:20000
    success = 0;
    for j = 1:200
        success = success + coin(randperm(2,1));
    end
    success_vec(i) = success;
end

    p_vec = binocdf(success_vec,200,0.5);
    hist(p_vec);

Trying to change the way in which I generate random numbers didn't help.
I would really appreciate any explanation here.

Best Answer

The result that $p$ values have a uniform distribution under $H_0$ holds for continuously distributed test statistics - at least for point nulls, as you have here.

As James Stanley mentions in comments the distribution of the test statistic is discrete, so that result doesn't apply. You may have no errors at all in your code (though I wouldn't display a discrete distribution with a histogram, I'd lean toward displaying the cdf or the pmf, or better, both).

While not actually uniform, each jump in the cdf of the p-value takes it to the line $F(x)=x$ (I don't know a name for this, but it ought to have a name, perhaps something like 'quasi-uniform'):

enter image description here

It's quite possible to compute this distribution exactly, rather than simulate - but I've followed your lead and done a simulation (though a larger one than you have).

Such a distribution needn't have mean 0.5, though as the $n$ in the binomial increases the step cdf will approach the line more closely, and the mean will approach 0.5.

One implication of the discreteness of the p-values is that only certain significance levels are achievable -- the ones corresponding to the step-heights in the actual population cdf of p-values under the null. So for example you can have an $\alpha$ near 0.056 or one near 0.04, but not anything closer to 0.05.

Best Answer

Related Solutions

P-Values – Why Are They Uniformly Distributed Under the Null Hypothesis?

Edit:

Solved – Is it fair to use FDR when the p-value distribution is not uniform under null hypothesis

Related Question