I have been struggling to reconcile my (very basic) understanding of P-values with the approach of one of my colleagues and it appears to come down to interpretation of p-value terminology. This is having a knock-on effect on the correct implementation of a Holm-Bonferroni correction – any advice would be welcome!
The problem is this:
We have run a statistical test to look at whether the expected number of observations is higher or lower than expected – thus it is a two tailed test (<2.5% or >97.5%). The number of observations was converted to a z-score (<-1.96 or >1.96) and then into a probability (P). For our reporting we had two options:
- Report the P-value as rejected if P<2.5% or P>97.5%
- Convert the P-value into a one-tailed value and reject if its P<5%
The issue arises because my colleague believes the following:
Option 2 is the more widely used definition. When we see "two-tailed p-value" referred to, this is usually what is meant. And when we apply Holm-Bonferroni, it is expecting rejection to correspond to p-value below 5%, so we need this form of the p-value.
and
What we report as a p-value under option 1 is what would universally be reported as the p-value for a one-tailed test, so can be referred to as a "one-tailed" p-value. The p-value under option 2 is what would generally be referred to as a "two-tailed" p-value, i.e. a p-value which has been converted to reflect the fact that the test is being run as a two-tailed test.
This is something I am struggling with – is this actually true because it seems completely counter-intuitive to me? I can find nothing on the internet to suggest that this is the case either.
This interpretation is critical because we go on to perform a Holm-Bonferroni correction (there are > 50 tests in total), which requires (what I understand to be) one-sided style P-values (P<5%) – thus I believe that our 'two-sided' p-value from option 1 should be divided by 2 (0.5*P or 0.5*(1-P)) to convert it appropriately before running the test. however my colleague believes it should be multiplied by 2 (2*P or 2*(1-P)) because Option 1 is actually known as a 'one-sided' value…
Can anyone offer any guidance on this to me as what I thought was a relatively simple concept is now confusing me greatly!
Best Answer
Most of what you've said is correct, but I think you might be confusing yourself unnecessarily by having different definitions of a one and two tailed test. Just to briefly review:
A one-tailed test is the probability that the area of your distribution ($t$, $Z$, $f$, etc.) is
Thus if you are expected a negative effect (and using $t$ distribution as an example), your $P$-value would be equal to
$$ P(t \leq t_{observed}) $$
or for a positive effect,
$$ P(t \geq t_{observed}) $$
both with the appropriate degrees of freedom.
A two-tailed test is equal to the probability that the area of your distribution is EITHER above or below your observed test statistic, equal to
$$ P(t \geq t_{observed}) + P(t \leq t_{observed}) \\ P( t \geq | \, t_{observed} \, |) $$
Because the $t$ distribution is symmetrical around 0, the above two probabilities are equal. Therefore, the two tailed $P$ value is twice the one-tailed $P$ value.
Depending on the direction of effect, to convert your two-tailed $P$-value to a one tailed $P$-value, you must divide the former by 2.
$$ P_{one-tailed} = \begin{cases} (\frac{1}{2} P_{two-tailed}) & \text{ in the right direction} \\ (1-\frac{1}{2} P_{two-tailed}) & \text{ in the other direction} \end{cases} $$
Just the clarify, Holm-Bonferonni only requires that all $P$-values have the same null hypothesis. That is, either they are all two-sided or all one-sided. It doesn't make much sense to compare things with different nulls.