Hypothesis Testing – Why Does Null Hypothesis in Simple Linear Regression (i.e., Slope = 0) Have Distribution?

hypothesis testinglinearp-valueregressiont-test

I have been reading about simple linear regression ($y=wx+b$), and I started a section when it talks about null hypothesis $w=0$. Then, when it wants to calculate p-value, they use $t-score = \frac{\bar{w}-0}{SE}$, where SE is the standard error. What I do not understand is this fraction.

In z-score, we have $\frac{x-\mu}{\sigma}$, and $\mu$ and $\sigma$ are parameters of distribution. So, similarly, regarding the t-score, I should think that 0 and $SE$ are parameters of distribution (of null hypothesis. correct?). However, if the null hypothesis says the slope, $w$, is zero, why should I consider a distribution for it? And, if there is a distribution, why is its spread parameter equal to $SE$, which is an estimation of standard deviation of $w$ (calculated based on alternative hypothesis)?

I tried to look at different articles and a couple of books, but whenever I get to this part, I still feel like something is missing.

Best Answer

Why does null hypothesis in simple linear regression (i.e. slope = 0) have distribution?

A null hypothesis is not a random variable; it doesn't have a distribution.

A test statistic has a distribution. In particular we can compute what the distribution of some test statistic would be if the null hypothesis were true.

If the sample value of the test statistic is such that this value or one more extreme (further toward what you're expect if the alternative were true) would be particularly rarely observed if the null were true, then we have a choice between saying "the null is true but some very rare event happened" and "the null is not true and we needn't invoke an unusual event to explain it".

As the chance of observing something at least as unusual as our sample's test statistic becomes very small, the null becomes harder to maintain as an explanation. We choose to reject the null for the most extreme of these and not to reject the null for the test statistics that would not be surprising. The least extreme test statistic that we would still reject is the critical value, and that and all more extreme values form the rejection region (the set of values of the test statistic that lead us to reject $H_0$).

I should think that 0 and SE are parameters of distribution (of null hypothesis. correct?).

$0$ is the value of the mean parameter under a particular null hypothesis but the standard error in the denominator is not a parameter; it is a sample estimate of a parameter - it's an estimate of the standard deviation of the distribution of $\hat{w}$.

However, if the null hypothesis says the slope, w, is zero, why should I consider a distribution for it?

You're conflating the population slope with the sample slope. The population slope is hypothesized to be $0$ but if that were true, the sample slope would not be $0$. It would be some number more or less "near" $0$.

To see if the sample slope is "too far" from $0$ to be reasonably consistent with having come from a population slope of $0$, we need to know how much we should reasonably expect the sample slope to vary from $0$.

That is why we need to consider the distribution that the sample slope $\hat{w}$ would have if $H_0$ were true.

And, if there is a distribution, why is its spread parameter equal to SE, which is an estimation of standard deviation of w (calculated based on alternative hypothesis)?

In this particular situation (regression), the standard deviation of the distribution of $\hat{w}$ (not $w$) is a sensible estimator under both hypotheses, not just under the alternative. It's perfectly reasonable to use it in this case.

Related Question