Solved – Interpretation of p-value; difference between p=.06 and p=.99

interpretationp-valuestatistical significance

I'm sorry if this is a duplicate, but when looking into the other questions asked, they did not seem to ask specifically the question I'm curious about.

I was wondering how to view p-values. We all have learned that a p-value < .05 is considered significant (according to your alpha level), and if so, you can interpret your results e.g. make a claim about the direction of effect in a regression.
However, if you obtain a p-value = .06, it is not considered significant, therefore you cannot make a claim about the direction of the effect (even though you might have plotted a graph that might suggest there is a positive relationship for example).
The same would go is you have obtained a p-value = .99. In both situations, you cannot make a claim about your hypothesis (i.e. reject H0).

I was however wondering now if there is something to say about the p-value of .06. To me, intuitively, it would seem that there is "more evidence" of there being a relationship between your variables in this situation then when obtaining a p-value of .99. Why is this set alpha level of .05 so holy? Why can't we view our p-values as a continuum, the lower the p-value, the more evidence, independent of any cut off?

Best Answer

Scientifically there is nothing magic about p<=0.05. What is magic about it is that it has somehow become an accepted threshold for being able to publish (or e.g. being able to submit a drug to regulators). As you say, there is in reality a continuum of evidence and the difference between 0.049 and 0.051 is in iteself rather meaningless.

However, you should also note that p-values and evidence are not exactly the same thing. Other aspects such as the prior plausibility of the hypotheses, as well as the experimental design play a role and the overall body of evidence play a role. To give some examples:

  • If I do an experiment to prove that mind-reading powers exist, I probably need more than a single experiment with p=0.049 to convince the scientific community that this is a true phenomenon. That would be especially the case, when there are a number of previous studies that in their totality do not support the idea.
  • Or I do an experiment for some potentially plausible psychological effect, but do a tiny experiment with just 20 student volunteers so that there is no power for any plausible effect size. In that case any p<=0.05 is not so much evidence for an effect, as noise that should not be overinterpreted.
  • On the other hand, if I do a study to show that a drug that was shown to work in two trials, one conducted in Europe and one in the US, I would be rather convinced that the drug also works for inhabitants of Australia even on the basis of a third trial with p=0.049 conducted in Australia (and also if the trial had estimates in the right direction, but ended up with p=0.1 or so).
Related Question