Hypothesis Testing – How to Calculate P-Value for Binomial Hypothesis Testing in Excel

binomial distributionexcelhypothesis testingp-value

I am confused with calculation of the p-value for binomial hypothesis testing when not using the normal approximation and its Z-statistic.

My hypothesis testing is for the situation when H₀: π = π₀ and H₁: π ≠ π₀. Suppose we have a sample with 101 successes out of 790 trials (π = 12.8%). We also have a reference value π₀ = 11.7% for the supposed population percentage. I want to calculate the exact binomial p-value for two-sided test, and I got stuck in the middle.

First, I consider the binomial distribution B(790, 0.117) and calculate the probability of observing 101 or more successes (right tail). I use Excel formula =1 - BINOM.DIST(100, 790, 0.117, TRUE) which evaluates to 0.185…, but it is not a question of Excel coding, but the question of the theory of calculating.

My problem is that I don’t fully understand what to do next to arrive at p-value from this 0.185. Some sources suggest that I should multiply this number by 2, but I’m in doubt: since my binomial distribution is not symmetric, I cannot simply double one-tail probability to get the proper two-sided p-value.

I feel like I should calculate some other probability for the same binomial distribution and add it to 0.185 go get the p-value, but probability of what exactly should it be? Please help me to find out.

Best Answer

Seems I have found the answer.

Suppose we have a binomial situation with n trials and probability π₀ of success. We also have k observed successes, so the observed proportion is π = k/n. We want to know whether the difference between π and π₀ is statistically significant; to be more precise, we want to know the exact p-value measuring the level of this significance.

This is two-sided binomial test, so the hypotheses are:

  • H₀: π = π
  • H₁: ππ

The resulting p-value will be the sum of all individual binomial probabilities for all x values whose probabilities are no greater than the probability of k, both from the left and right tails of the distribution. I.e. we need to calculate the sum of all probabilities marked in red in this figure:

p-value illustration

In Excel, pₖ (the probability for k) is calculated by the formula = BINOM.DIST(k, n, π₀, FALSE). The condition that, for a given x, its probability is no greater than pₖ is BINOM.DIST(x, n, π₀, FALSE) <= pₖ. We can generate an array of all these x values with the formula = FILTER(SEQUENCE(n + 1, 1, 0), BINOM.DIST(SEQUENCE(n + 1, 1, 0), n, π₀, FALSE) <= pₖ). Finally, the sum of their probabilities would be:

= SUM(BINOM.DIST(FILTER(SEQUENCE(n + 1, 1, 0), BINOM.DIST(SEQUENCE(n + 1, 1, 0), n, π₀, FALSE) <= pₖ), n, π₀, FALSE))

In my particular case with n = 790, k = 101, and π₀ = 11.7% the formula is

= SUM(BINOM.DIST(FILTER(SEQUENCE(791, 1, 0), BINOM.DIST(SEQUENCE(791, 1, 0), 790, 0.117, FALSE) <= BINOM.DIST(101, 790, 0.117, FALSE)), 790, 0.117, FALSE))

which evaluates to p-value = 0.346…. Not too significant, unfortunately, but the method seems to work.

Related Question