P-Value – Is the Exact Value of a P-Value Meaningless in Statistical Significance?

bonferronip-valuestatistical significance

I had a discussion with a statistician back in 2009 where he stated that the exact value of a p-value is irrelevant: the only thing that is important is whether it is significant or not. I.e. one result cannot be more significant than another; your samples for example, either come from the same population or don't.

I have some qualms with this, but I can perhaps understand the ideology:

  1. The 5% threshold is arbitrary, i.e. that p = 0.051 is not significant and that p = 0.049 is, shouldn't really change the conclusion of your observation or experiment, despite one result being significant and the other not significant.

    The reason I bring this up now is that I'm studying for an MSc in Bioinformatics, and after talking to people in the field, there seems to be a determined drive to get an exact p-value for every set of statistics they do. For instance, if they 'achieve' a p-value of p < 1.9×10-12, they want to demonstrate HOW significant their result is, and that this result is SUPER informative. This issue exemplified with questions such as: Why can't I get a p-value smaller than 2.2e-16?, whereby they want to record a value that indicates that by chance alone this would be MUCH less than 1 in a trillion. But I see little difference in demonstrating that this result would occur less than 1 in a trillion as opposed to 1 in a billion.

  2. I can appreciate then that p < 0.01 shows that there is less than 1% chance that this would occur, whereas p < 0.001 indicates that a result like this is even more unlikely than the aforementioned p-value, but should your conclusions drawn be completely different? After all they are both significant p-values. The only way I can conceive of wanting to record the exact p-value is during a Bonferroni correction whereby the threshold changes due to the number of comparisons made, thus decreasing the type I error. But even still, why would you want to show a p-value that is 12 orders of magnitude smaller than your threshold significance?

  3. And isn't applying the Bonferroni correction in itself slightly arbitrary too? In the sense that initially the correction is seen as very conservative, and therefore there are other corrections that one can choose to access the significance level that the observer could use for their multiple comparisons. But because of this, isn't the point at which something becomes significant essentially variable depending upon what statistics the researcher wants to use. Should statistics be so open to interpretation?

In conclusion, shouldn't statistics be less subjective (although I guess the need for it to be subjective is as a consequence of a multivariate system), but ultimately I want some clarification: can something be more significant than something else? And will p < 0.001 suffice in respect to trying to record the exact p-value?

Best Answer

  1. The type 1 / false rejection error rate $\alpha=.05$ isn't completely arbitrary, but yes, it is close. It's somewhat preferable to $\alpha=.051$ because it's less cognitively complex (people like round numbers and multiples of five). It's a decent compromise between skepticism and practicality, though maybe a little outdated – modern methods and research resources may make higher standards (i.e., lower $p$ values) preferable, if standards there must be (Johnson, 2013).

    IMO, the greater problem than the choice of threshold is the often unexamined choice to use a threshold where it is not necessary or helpful. In situations where a practical choice has to be made, I can see the value, but much basic research does not necessitate the decision to dismiss one's evidence and give up on the prospect of rejecting the null just because a given sample's evidence against it falls short of almost any reasonable threshold. Yet much of this research's authors feel obligated to do so by convention, and resist it uncomfortably, inventing terms like "marginal" significance to beg for attention when they can feel it slipping away because their audiences often don't care about $p$s $\ge.05$. If you look around at other questions here on $p$ value interpretation, you'll see plenty of dissension about the interpretation of $p$ values by binary fail to/reject decisions regarding the null.

  2. Completely different – no. Meaningfully different – maybe. One reason to show a ridiculously small $p$ value is to imply information about effect size. Of course, just reporting effect size would be much better for several technical reasons, but authors often fail to consider this alternative, and audiences may be less familiar with it as well, unfortunately. In a null-hypothetical world where no one knows how to report effect sizes, one may be right most often in guessing that a smaller $p$ means a larger effect. To whatever extent this null-hypothetical world is closer to reality than the opposite, maybe there's some value in reporting exact $p$s for this reason. Please understand that this point is pure devil's advocacy...

    Another use for exact $p$s that I've learned by engaging in a very similar debate here is as indices of likelihood functions. See Michael Lew's comments on and article (Lew, 2013) linked in my answer to "Accommodating entrenched views of p-values".

  3. I don't think the Bonferroni correction is the same kind of arbitrary really. It corrects the threshold that I think we agree is at least close-to-completely arbitrary, so it doesn't lose any of that fundamental arbitrariness, but I don't think it adds anything arbitrary to the equation. The correction is defined in a logical, pragmatic way, and minor variations toward larger or smaller corrections would seem to require rather sophisticated arguments to justify them as more than arbitrary, whereas I think it would be easier to argue for an adjustment of $\alpha$ without having to overcome any deeply appealing yet simple logic in it.

    If anything, I think $p$ values should be more open to interpretation! I.e., whether the null is really more useful than the alternative ought to depend on more than just the evidence against it, including the cost of obtaining more information and the added incremental value of more precise knowledge thusly gained. This is essentially the Fisherian no-threshold idea that, AFAIK, is how it all began. See "Regarding p-values, why 1% and 5%? Why not 6% or 10%?"

If fail to/reject crises aren't forced upon the null hypothesis from the outset, then the more continuous understanding of statistical significance certainly does admit the possibility of continuously increasing significance. In the dichotomized approach to statistical significance (I think this is sometimes referred to as the Neyman-Pearson framework; cf. Dienes, 2007), no, any significant result is as significant as the next – no more, no less. This question may help explain that principle: "Why are p-values uniformly distributed under the null hypothesis?" As for how many zeroes are meaningful and worth reporting, I recommend Glen_b's answer to this question: "How should tiny $p$-values be reported? (and why does R put a minimum on 2.22e-16?)" – it's much better than the answers to the version of that question you linked on Stack Overflow!

References
- Johnson, V. E. (2013). Revised standards for statistical evidence. Proceedings of the National Academy of Sciences, 110(48), 19313–19317. Retrieved from http://www.pnas.org/content/110/48/19313.full.pdf.
- Lew, M. J. (2013). To P or not to P: On the evidential nature of P-values and their place in scientific inference. arXiv:1311.0081 [stat.ME]. Retrieved from http://arxiv.org/abs/1311.0081.

Related Question