Solved – Why we reject the null hypothesis at the 0.05 level and not the 0.5 level (as we do in the Classification)

classificationhypothesis testingp-valueprobability

Hypothesis testing is akin to a Classification problem. So say, we have 2 possible labels for an observation (subject) — Guilty vs. Non-Guilty. Let Non-Guilty be the null Hypothesis. If we viewed the problem from a Classification viewpoint we would train a Classifier which would predict the probability of the subject belonging in each of the 2 Classes, given the Data. We would then pick the Class with the highest probability. In that case 0.5 probability would be the natural threshold. We might vary the threshold in case we assigned different costs to False Positive vs. False Negative errors. But rarely we would go so extreme as setting the threshold at 0.05, i.e. assign the subject to Class "Guilty" only if the probability is 0.95 or higher. But if I understand well, this is what we are doing as a standard practice when we view the same problem as a problem of Hypothesis testing. In this latter case, we will not assign the label "Non-Guilty" –equivalent to assigning the label "Guilty"– only if the probability of being "Non-Guilty" is less than 5%. And perhaps this might make sense if we truly want to avoid to convict innocent people. But why this rule should prevail in all Domains and all cases?

Deciding which Hypothesis to adopt is equivalent to defining an Estimator of the Truth given the Data. In Maximum Likelihood Estimation we accept the Hypothesis that is more likely given the Data — not necessarily though overwhelmingly more likely. See the graph below:

enter image description here

Using a Maximum Likelihood approach we would favor the Alternative Hypothesis in this example if the value of the Predictor was above 3, e.g. 4, although the probability of this value to have been derived from the Null Hypothesis would have been larger than 0.05.

And while the example with which I begun the post is perhaps emotionally charged, we could think of other cases, e.g. a technical improvement. Why we should give such an advantage to the Status Quo when the Data tell us that the probability that the new solution is an improvement is greater than the probability that it is not?

Best Answer

Say you end up in court and you did not do it. Do you think it is fair that you still have a 50% chance of being found guilty? Is a 50% chance of being innocent "guilty beyond reasonable doubt"? Would you think it is fair that you had a 5% chance of being found guilty even though you did not do it? If I were in court I would consider 5% not conservative enough.

You are right that the 5% is arbitrary. We could just as well choose 2%, or 1%, or if you are nerdy $\pi$% or $e$%. There are people who are willing to accept 10%, but 50% will never be acceptable.


In response to your edit of the question:

Your idea would be reasonable if all hypotheses were created equal. However, that is not the case. We typically care about the alternative hypothesis, so we strengthen our argument if we choose a low $\alpha$. In that sense, the example you chose originally illustrates that point well.

Related Question