Why is noncentral t-distribution for sample size determination for small sample

statistical-inferencestatistics

For sample size determination for small sample, population distribution of which is normal, the test statistic $$T = \frac{\bar{x}-\mu_0}{S/\sqrt{n}}, ~~~\text{where $\mu_0$ is null value}$$

is considered under the assumption of the alternative hypothesis $\mu = \mu' \gt \mu_0$.
Since $T$ follows noncentral $t$ distribution, chart or table is used to determine sample size $n$ with desired type II error.

The above is typical explanation available in statistics text books.

Now I'm curious what's wrong with the following reasoning:

If we assume that the alternative hypothesis $μ=μ′$ is correct $T = \frac{\bar{x} – \mu'}{S/\sqrt{n}}$ should follow $t$-distribution by the same argument that $T = \frac{\bar{x} – \mu_0}{S/\sqrt{n}}$ follows $t$-distribution when $μ=μ_0$. Then we can proceed on to calculate

$$
Pr(\frac{\bar{x}-\mu'}{S/\sqrt{n}} \leq \frac{\mu_0 + t_{\alpha, n-1}s/\sqrt{n} – \mu'}{s/\sqrt{n}}) = \beta(\mu')~~~\text{(*)}
$$

to determine sample size n that satisfies the required $\beta$.
noncentral t-distribution cannot be involved at all.

Specifically, if my assumption that $(\bar{x}-\mu')/(s/\sqrt{n})$ follows t distribution is correct, we can simply set:

$$
\frac{\mu_0 + t_{\alpha, n-1}s/\sqrt{n} – \mu'}{s/\sqrt{n}} = t_{\beta(\mu')}
$$

and solve for $n$. No need to summon noncentral t distribution, no need to use $\beta$ curve to find $n$

=================================================================
With help of @BruceET, I found my silly mistake.

The equation to determine a rejection region is

$$
Pr(\frac{\bar{x} – \mu_0}{S/\sqrt{n}} > t_{\alpha, n-1}) = \alpha
$$

From this I wrongly thought the rejection region in terms of $\bar{x}$ is

$$
\bar{x} > \mu_0 + t_{\alpha, n-1}\frac{s}{\sqrt{n}}
$$

This is wrong because $s$ itself is a random variable and therefore cannot be included in the bounds. If it were, I could have argued the eq.(*) could be used to calculate power for the alternative hypothesis.

Best Answer

Let $X_1, \dots, X_n$ be a random sample from $\mathsf{Norm}(\mu, \sigma).$ Suppose you want to test $H_0: \mu=10$ against $H_a: \mu > 10$ at the 5% level of significance when $n = 16.$

Then the t statistic for this test is $T_0 = \frac{\bar X - 10}{S/\sqrt{n}}$ and $H_0$ is rejected if $T_0 \ge t^* = 1.753,$ where $t^*$ cuts probability 5% from the upper tail of Student's t distribution with $\nu =n-1 = 15$ degrees of freedom: $\mathsf{T}(\nu=15).$ Computation in R.

qt(.95, 15)
[1] 1.75305

The power $\pi(12) = P(\mathrm{Rej}\, H_0\, |\, \mu = 12)$ against the specific alternative $\mu_a = 12$ is

$$\pi(\mu_a) = \pi(12) = P\left(\frac{\bar X - \mu_0}{S/\sqrt{n}}\ge t^*\,|\mu_a=12\right) = P\left(\frac{\bar X - \mu_a + (\mu_a - \mu_0)}{S/\sqrt{n}}\ge t^*\,|\mu_a=12\right)\\ = P\left(\frac{\bar X - \mu_a + 2}{S/\sqrt{n}}\ge t^*\,|\mu_a=12\right) = P\left( \frac{ Z + \delta }{ \sqrt{V/\nu} } \ge t^* \right),$$ where $\delta = \sqrt{n}(\mu_a-\mu_0)/\sigma = 2\sqrt{n}/\sigma,\,$ $Z \sim\mathsf{Norm}(0,1),$ and $V = \nu S^2/\sigma^2 \sim \mathsf{Chisq}(\nu).$

That is, by definition, $\frac{ Z + \delta }{ \sqrt{V/\nu}}\sim \mathsf{T}(\nu, \delta),$ Student's noncentral t distribution with $\nu$ degrees of freedom and noncentralty parameter $\delta.$

In particular, for our example, $\delta = 2\sqrt{16}/\sigma.$ For $\sigma = 2,$ we have $\delta = 4$ and $\pi(12) = 0.985.$

1 - pt(qt(.95, 15), 15, 4) 
[1] 0.9848477

The following simulation in R gives essentially the same power. With a million iterations one can expect three-place accuracy.

set.seed(1128)
pv = replicate(10^6,  t.test(rnorm(16, 12, 2), 
               mu=10, alt="g")$p.val)
mean(pv <= .05)
[1] 0.984809

Ref: Except for notation, parts of the above parallel Sect 2.3 of Bain & Englehardt (1992), Intro. to Probability and Math. Stat., Duxbury, p400.

Related Question