Solved – Is this the solution to the p-value problem

hypothesis testingp-valuestatistical significance

In February 2016, the American Statistical Association released a formal statement on statistical significance and p-values. Our thread about it discusses these issues extensively. However, no authority has come forth to offer a universally recognized effective alternative–until now. The American Statistical Society (ASS) has published its response, p-values: What’s next?

"The p-value isn't good for much."

We think the ASA did not go far enough. It is time to admit that the era of p-values is over. Statisticians have successfully used them to baffle undergraduates, trick scientists, and fool editors everywhere, but the world is starting to see through this ruse. We need to abandon this early 20th century attempt by statisticians to control decision making. We need to return to what actually works.

The official ASS proposal is this:

In place of p-values, the ASS advocates the STOP (SeaT-Of-Pants procedure). This time-honored and -tested method was used by the ancient Greeks, renaissance men, and all scientists until Ronald Fisher came along and ruined things. The STOP is simple, direct, data-driven, and authoritative. To carry it out, an authority figure (an older male, by preference) reviews the data and decides whether they agree with his opinion. When he decides they do, the result is “significant.” Otherwise it is not and everybody is required to forget about the whole thing.

Principles

The response addresses each of the ASA's six principles.

  1. The STOP can indicate how incompatible the data are with a specified statistical model.

    We like this phrase because it’s such a fancy way of saying the STOP will answer any question yes or no. Unlike p-values or other statistical procedures, it leaves no doubt. It’s the perfect response to those who say “we don’t need no stinkin’ null hypothesis! What the *?!@ is that, anyway? Nobody ever could figure out what it was supposed to be.”

  2. The STOP doesn’t measure the probability that a hypothesis is true: it actually decides whether it’s true or not.

    Everybody is confused by probabilities. By taking probability out of the picture, the STOP eliminates the need for years of undergraduate and graduate study. Now anybody (who is sufficiently old and male) can perform statistical analysis without the pain and torture of listening to even a single statistical lecture or running arcane software that spews unintelligible output.

  3. Scientific conclusions and business or policy decisions can be based on common sense and real authority figures.

    Important decisions always have been made by authorities, anyway, so let’s just admit it and cut out the middlemen. Using the STOP will free statisticians to do what they are best suited for: using numbers to obfuscate the truth and sanctifying the preferences of those in power.

  4. Proper inference requires full reporting and transparency.

    The STOP is the most transparent and self-evident statistical procedure ever invented: you look at the data and you decide. It eliminates all those confusing z-tests, t-tests, chi-squared tests, and alphabet soup procedures (ANOVA! GLM! MLE!) used by people to hide the fact they have no clue what the data mean.

  5. The STOP measures the importance of the result.

    This is self-evident: if a person in authority employs the STOP, then the result must be important.

  6. By itself, the STOP provides a good measure of evidence regarding a model or hypothesis.

    We wouldn’t want to challenge an authority, would we? Researchers and decision makers will recognize that the STOP provides all the information they need to know. For these reasons, data analysis can end with the STOP; there is no need for alternative approaches, like p-values, machine learning, or astrology.

Other approaches

Some statisticians prefer so-called “Bayesian” methods, in which an obscure theorem posthumously published by an 18th century cleric is applied mindlessly to solve every problem. Its most noted advocates freely admit these methods are “subjective.” If we’re going to use subjective methods, then obviously the more authoritative and knowledgeable the decision maker is, the better the result will be. The STOP thereby emerges as the logical limit of all Bayes methods. Why go to the effort of working those awful calculations, and tying up so much computer time, when you can just show the data to the guy in charge and ask him what his opinion is? End of story.

Another community has recently arisen to challenge the priesthood of statisticians. They call themselves “machine learners” and “data scientists,” but they’re really just hackers looking for higher status. It’s the official position of the ASS that these guys should go form their own professional organization if they want people to take them seriously.


The question

Is this the answer to the problems the ASA identified with p-values and null hypothesis testing? Can it really unite the Bayesian and Frequentist paradigms (as implicitly claimed in the response)?

Best Answer

I've been advocating for my own new approach to statistical decision making called RADD: Roll A Damn Die. It also addresses all the key points.

1) RADD can indicate how compatible the data are with a specified statistical model.

If you roll a higher number, clearly the evidence is more in favor of your model! An extra benefit is that, if we desire even more confidence, we can roll a die with more sides. You can even find 100 sided dice if you search enough!

2) RADD can decide whether a hypothesis is true or not.

You only have to roll a 2 sided die, i.e., flip a coin.

3) RADD can be used to make business or policy decisions

Get a bunch of policy makers in a room, and have them all roll dice! Highest wins!

4) RADD is transparant.

The result can be recorded, and the die itself can be kept for further research*

5) RADD measures the importance of the result.

Obviously, rolling higher signifies a very important event has occurred.

6) RADD provides a good measure of evidence.

Didn't we say higher rolls are better?

So, no, STOP is not the answer. The answer is RADD.