Solved – Statistical significance level – hypothesis testing

hypothesis testing

Why is significance level called the probability of rejecting the null hypothesis given that the null hypothesis is true?

This is my understanding of hypothesis testing, correct me if I'm wrong. For example, the p-value tells us what's the probability of obtaining this or even more extreme value of given statistic. If it turns out to be, say, 0.1%, it's very unlikely that this value of the statistic was random, happened without reason. Null hypothesis is used to sort of justify the alternative hypothesis, right? We show that it's very unlikely this might happen given null hypothesis is true, that's why it justifies the alternative hypothesis.

Now, the question is what is likely or not and this is the significance level we choose. If we chose it to be 1%, then given p-value equal 0.1% we're going to reject the null hypothesis. Why is significance level equal the probability of rejecting the null hypothesis, given it's true? Because it's the probability we can actually get such a p-value that's more extreme that the significance level we've chosen?

Best Answer

Your understanding is mostly correct. Let $X$ be a random variable that follows the same distribution as your test statistic under the null hypothesis. The p value is the probability that a randomly drawn $X$ is at least as large as the test statistic you computed. If that probability is very low, then that is good reason to believe that the null hypothesis does not hold.

You just need to be careful about the difference in terminology between p value and significance level. A significance level is a pre-specified cutoff p value, below which you reject the null hypothesis and above which you do not have enough evidence to reject the null hypothesis. The p value itself is just a probability-valued function of the test statistic that gets smaller as the test statistic gets more extreme (i.e. the CDF of the distribution of the test statistic under the null).

So the significance level does not determine the probability of rejecting the null hypothesis. The significance level determines the largest probability of rejecting the null that you would consider evidence enough to reject the null. When you set a significance level, you are setting an upper bound, below which you find the probability of observing the null too extreme to believe it was randomly drawn from the null distribution.

You might have been confused by someone talking about type 1 error rates and such. All that stuff means is that, if you run the experiment many times, if the null hypothesis is true ever time, and you set your significance level to $\alpha$, you will reject the null hypothesis $\alpha \times 100$% of the time purely due to random chance. Understanding this can help you set reasonable $\alpha$ levels if you do plan to do null hypothesis testing.

Related Question