Solved – A psychology journal banned p-values and confidence intervals; is it indeed wise to stop using them

confidence intervaleffect-sizehypothesis testingp-valuepsychology

On 25 February 2015, the journal Basic and Applied Social Psychology issued an editorial banning $p$-values and confidence intervals from all future papers.

Specifically, they say (formatting and emphasis are mine):

  • […] prior to publication, authors will have to remove all
    vestiges of the NHSTP [null hypothesis significance testing procedure] ($p$-values, $t$-values, $F$-values, statements about ‘‘significant’’ differences or lack thereof, and so on).

  • Analogous to how the NHSTP fails to provide the probability of the null hypothesis, which is needed to provide a strong case for rejecting it, confidence intervals do not
    provide a strong case for concluding that the population
    parameter of interest is likely to be within the stated
    interval. Therefore, confidence intervals also are banned
    from BASP.

  • […] with respect to Bayesian procedures, we reserve the right to make case-by-case
    judgments, and thus Bayesian procedures are neither
    required nor banned from BASP.

  • […] Are any inferential statistical procedures
    required? — No […] However, BASP will require strong
    descriptive statistics, including effect sizes.

Let us not discuss problems with and misuse of $p$-values here; there already are plenty of excellent discussions on CV that can be found by browsing the p-value tag. The critique of $p$-values often goes together with an advice to report confidence intervals for parameters of interest. For example, in this very well-argued answer @gung suggests to report effect sizes with confidence intervals around them. But this journal bans confidence intervals as well.

What are the advantages and disadvantages of such an approach to presenting data and experimental results as opposed to the "traditional" approach with $p$-values, confidence intervals, and significant/insignificant dichotomy? The reaction to this ban seems to be mostly negative; so what are the disadvantages then? American Statistical Association has even posted a brief discouraging comment on this ban, saying that "this policy may have its own negative consequences". What could these negative consequences be?

Or as @whuber suggested to put it, should this approach be advocated generally as a paradigm of quantitative research? And if not, why not?

PS. Note that my question is not about the ban itself; it is about the suggested approach. I am not asking about frequentist vs. Bayesian inference either. The Editorial is pretty negative about Bayesian methods too; so it is essentially about using statistics vs. not using statistics at all.


Other discussions: reddit, Gelman.

Best Answer

The first sentence of the current 2015 editorial to which the OP links, reads:

The Basic and Applied Social Psychology (BASP) 2014 Editorial *emphasized* that the null hypothesis significance testing procedure (NHSTP) is invalid...

(my emphasis)

In other words, for the editors it is an already proven scientific fact that "null hypothesis significance testing" is invalid, and the 2014 editorial only emphasized so, while the current 2015 editorial just implements this fact.

The misuse (even maliciously so) of NHSTP is indeed well discussed and documented. And it is not unheard of in human history that "things get banned" because it has been found that after all said and done, they were misused more than put to good use (but shouldn't we statistically test that?). It can be a "second-best" solution, to cut what on average (inferential statistics) has come to losses, rather than gains, and so we predict (inferential statistics) that it will be detrimental also in the future.

But the zeal revealed behind the wording of the above first sentence, makes this look -exactly, as a zealot approach rather than a cool-headed decision to cut the hand that tends to steal rather than offer. If one reads the one-year older editorial mentioned in the above quote (DOI:10.1080/01973533.2014.865505), one will see that this is only part of a re-hauling of the Journal's policies by a new Editor.

Scrolling down the editorial, they write

...On the contrary, we believe that the p<.05 bar is too easy to pass and sometimes serves as an excuse for lower quality research.

So it appears that their conclusion related to their discipline is that null-hypotheses are rejected "too-often", and so alleged findings may acquire spurious statistical significance. This is not the same argument as the "invalid" dictum in the first sentence.

So, to answer to the question, it is obvious that for the editors of the journal, their decision is not only wise but already late in being implemented: they appear to think that they cut out what part of statistics has become harmful, keeping the beneficial parts -they don't seem to believe that there is anything here that needs replacing with something "equivalent".

Epistemologically, this is an instance where scholars of a social science partially retract back from an attempt to make their discipline more objective in its methods and results by using quantitative methods, because they have arrived at the conclusion (how?) that, in the end, the attempt created "more bad than good". I would say that this is a very important matter, in principle possible to have happened, and one that would require years of work to demonstrate it "beyond reasonable doubt" and really help your discipline. But just one or two editorials and papers published will most probably (inferential statistics) just ignite a civil war.

The final sentence of the 2015 editorial reads:

We hope and anticipate that banning the NHSTP will have the effect of increasing the quality of submitted manuscripts by liberating authors from the stultified structure of NHSTP thinking thereby eliminating an important obstacle to creative thinking. The NHSTP has dominated psychology for decades; we hope that by instituting the first NHSTP ban, we demonstrate that psychology does not need the crutch of the NHSTP, and that other journals follow suit.