hypothesis-testing – Why Center the Sampling Distribution on H0 During Hypothesis Testing?

confidence intervalhypothesis testingp-value

A p-value is the probability to obtain a statistic that is at least as extreme as the one observed in the sample data when assuming that the null-hypothesis ($H_0$) is true.

Graphically this corresponds to the area defined by the sample statistic under the sampling distribution which one would obtain when assuming $H_0$:

center h0

However, because the shape of this assumed distribution is actually based on the sample data, centering it on $\mu_0$ seems like an odd choice to me.
If one would instead use the sampling distribution of the statistic, i.e. center the distribution on the sample statistic, then hypothesis testing would correspond to estimating the probability of $\mu_0$ given the samples.

center h1

In that case the p-value is the probability to obtain a statistic at least as extreme as $\mu_0$ given the data instead of the above definition.

Additionally, such an interpretation has the advantage of relating well to the concept of confidence intervals:
A hypothesis test with significance level $\alpha$ would be equivalent to checking whether $\mu_0$ falls within the $(1-\alpha)$ confidence interval of the sampling distribution.

CI2 95

I thus feel that centering the distribution on $\mu_0$ could be an unneccessary complication.
Are there any important justifications for this step which I did not consider?

Best Answer

Suppose $\boldsymbol X = (X_1, X_2, \ldots, X_n)$ is a sample drawn from a normal distribution with unknown mean $\mu$ and known variance $\sigma^2$. The sample mean $\bar X$ is therefore normal with mean $\mu$ and variance $\sigma^2/n$. On this much, I think there can be no possibility of disagreement.

Now, you propose that our test statistic is $$Z = \frac{\bar X - \mu}{\sigma/\sqrt{n}} \sim \operatorname{Normal}(0,1).$$ Right? BUT THIS IS NOT A STATISTIC. Why? Because $\mu$ is an unknown parameter. A statistic is a function of the sample that does not depend on any unknown parameters. Therefore, an assumption must be made about $\mu$ in order for $Z$ to be a statistic. One such assumption is to write $$H_0 : \mu = \mu_0, \quad \text{vs.} \quad H_1 : \mu \ne \mu_0,$$ under which $$Z \mid H_0 = \frac{\bar X - \mu_0}{\sigma/\sqrt{n}} \sim \operatorname{Normal}(0,1),$$ which is a statistic.

By contrast, you propose to use $\mu = \bar X$ itself. In that case, $Z = 0$ identically, and it is not even a random variable, let alone normally distributed. There is nothing to test.

Related Question