After looking around for a while without finding anything satisfactory, this is the best answer that seems to make sense to me:
Notice that the sampling distribution of the unknown $\sigma$ is not Normal, so $Q = \frac{\bar{Y} - \mu_{0}}{\sigma / \sqrt{n}}$ does not actually follow $N(0, 1)$, thus it cannot be a pivotal quantity, at least not a pivotal quantity with a Normal distribution.
The case for unknown $\mu$ however, is different because the sampling distribution of $\mu$ is Normal by Central Limit Theorem, so we know $Q = \frac{\bar{Y} - \mu}{\sigma_{0} / \sqrt{n}}$ follows $N(0, 1)$, which makes it a pivotal quantity.
This is why to find the confidence interval for $\sigma$, we have to use the pivotal quantity $$\frac{(n-1)S^2}{\sigma^2},$$ which follows a $\chi^2$ distribution with $n-1$ degrees of freedom.
1. Normal data, variance known: If you have observations $X_1, X_2, \dots, X_n$ sampled at random from
a normal population with unknown mean $\mu$ and known standard deviation $\sigma,$ then a 95% confidence interval (CI) for $\mu$ is $\bar X \pm 1.95 \sigma/\sqrt{n}.$ This is the only situation in which the z interval is exactly correct.
2. Nonnormal data, variance known: If the population distribution is not normal and the sample is 'large enough', then $\bar X$ is approximately normal and the same formula provides an approximate 95% CI. The rule that $n \ge 30$ is 'large enough' is unreliable here. If the population distribution is heavy-tailed, then $\bar X$ may not have a distribution that is close to normal (even if $n \ge 30).$ The 'Central Limit Theorem', often provides
reasonable approximations for moderate values of $n,$ but it is a limit theorem,
with guaranteed results only as $n \rightarrow \infty.$
3. Normal data, variance unknown. If you have observations $X_1, X_2, \dots, X_n$ sampled at random from
a normal population with unknown mean $\mu$ and standard deviation $\sigma,$ with $\mu$ estimated by the sample mean $\bar X$ and $\sigma$ estimated by the sample standard deviation $S.$ Then a 95% confidence interval (CI) for $\mu$ is $\bar X \pm t^* S/\sqrt{n},$ where $S$ is the sample standard deviation and
where $t^*$ cuts probability $0.025$ from the upper tail of Student's t distribution with $n - 1$ degrees of freedom. This is the only situation in which the t interval is exactly correct.
Examples: If $n=10$, then $t^* = 2.262$
and if $n = 30,$ then $t^* = 2.045.$ (Computations from R below; you could also use a printed 't table'.)
qt(.975, 9); qt(.975, 29)
[1] 2.262157 # for n = 10
[1] 2.04523 # for n = 30
Notice that 2.045 and 1.96 (from Part 1 above) both round to 2.0. If $n \ge 30$ then $t^*$ rounds to 2.0. That is the basis for
the 'rule of 30', often mindlessly parroted in other contexts where it is not relevant.
There is no similar coincidental rounding for CIs with confidence levels other than 95%. For example, in Part 1 above
a 99% CI for $\mu$ is obtained as $\bar X \pm 2.58 \sigma/\sqrt{n}.$ However,
$t^*=2.76$ for $n = 30$ and $t^* = 2.65$ for $n = 70.$
qnorm(.995)
[1] 2.575829
qt(.995, 29)
[1] 2.756386
qt(.995, 69)
[1] 2.648977
4. Nonnormal data, variance unknown: Confidence intervals based on the t distribution (as in Part 3 above) are known to be 'robust' against moderate departures from normality.
(If $n$ is very small, there should be no far outliers or evidence of severe skewness.) Then, to a degree that is difficult to predict, a t CI may provide a useful CI for $\mu.$
By contrast, if the type of distribution is known, it may be possible
to find an exact form of CI.
For example, if $n = 30$ observations from a (distinctly nonnormal)
exponential distribution with unknown mean $\mu$ have $\bar X = 17.24,\,
S = 15.33,$ then the (approximate) 95% t CI is $(11.33, 23.15).$
t.test(x)
One Sample t-test
data: x
t = 5.9654, df = 29, p-value = 1.752e-06
alternative hypothesis: true mean is not equal to 0
95 percent confidence interval:
11.32947 23.15118
sample estimates:
mean of x
17.24033
However,
$$\frac{\bar X}{\mu} \sim \mathsf{Gamma}(\text{shape}=n,\text{rate}=n),$$
so that $$P(L \le \bar X/\mu < U) = P(\bar X/U < \mu < \bar X/L)=0.95$$
and an exact 95% CI for $\mu$ is $(\bar X/U,\, \bar X/L) = (12.42, 25.16).$
qgamma(c(.025,.975), 30, 30)
[1] 0.6746958 1.3882946
mean(x)/qgamma(c(.975,.025), 30, 30)
[1] 12.41835 25.55274
Addendum on bootstrap CI: If data seem non-normal, but the actual population
distribution is unknown, then a 95% nonparametric bootstrap CI may be the best
choice. Suppose we have $n=20$ observations from an unknown distribution, with $\bar X$ = 13.54$ and values shown in the stripchart below.
The observations seem distinctly right-skewed and fail a Shapio-Wilk normality test with P-value 0.001. If we assume the data are exponential and use the method in Part 4, the 95% CI is $(9.13, 22.17),$ but we have no way to know whether the data are exponential.
Accordingly, we find a 95% nonparametric bootstrap
in order to approximate $L^*$ and $U^*$ such that
$P(L^* < D = \bar X/\mu < U^*) \approx 0.95.$ In the R code below
the suffixes .re
indicate random 're-sampled' quantities based on
$B$ samples of size $n$ randomly chosen without replacement from among the
$n = 20$ observations. The resulting 95% CI is $(9.17, 22.71).$ [There are
many styles of bootstrap CIs. This one treats $\mu$ as if it is a scale
parameter. Other choices are possible.]
B = 10^5; a.obs = 13.54
d.re = replicate(B, mean(sample(x, 20, rep=T))/a.obs)
UL.re = quantile(d.re, c(.975,.025))
a.obs/UL.re
97.5% 2.5%
9.172171 22.714980
Best Answer
The length of the interval is
$$\left(\bar{X}-Z_{(1-k)\alpha}\frac{\sigma}{\sqrt{n}}\right) - \left(\bar{X}-Z_{1 - k\alpha}\frac{\sigma}{\sqrt{n}}\right) = \left(Z_{1 - k\alpha} - Z_{(1-k)\alpha}\right)\frac{\sigma}{\sqrt{n}}. $$
Because when $k$ is varied $\sigma/\sqrt{n}$ remains constant, this is minimized provided $Z_{1 - k\alpha} - Z_{(1-k)\alpha}$ is minimized.
Another way to look at it is to write
$$z = Z_{(1-k)\alpha},\ w = Z_{1 - k\alpha}.$$
Because the interval $[z,w]$ should contain $1-\alpha$ probability and obviously both $z$ and $w$ will be finite at a minimum, necessarily
$$-\infty \lt z \lt Z_\alpha$$
and $k$ must lie between $0$ and $1$.
According to the Fundamental Theorem of Calculus, when $z$ is increased infinitesimally to $z+dz$, the probability of the interval decreases by $f(z)dz$ where $f$ is the PDF for $\bar X$. To compensate, $w$ must increase by an infinitesimal amount $dw$ for which
$$f(z)dz = f(w)dw.$$
In the figure, the interval $[z,w]$ has been shifted to $[z+dz,w+dw]$. To keep the probabilities the same, $dw$ is only about half of $dz$ because the height of the PDF at $z$, $f(z)$, is only about half the height at $w$. Therefore this shift has shrunk the interval. Shifting should continue until no more shrinking is possible, which will therefore occur when the heights at the interval endpoints are equal (as argued below).
At the same time the length of the interval, given by $w-z$, changes by $dw-dz$. A minimum will occur at a critical point, giving the criterion $0 = dw-dz$, implying by virtue of the preceding result that
$$f(z) = f(w).$$
For any unimodal continuous distribution with PDF $f$ there will be (practically by definition) at most two solutions to the equation $f(z) = c$ for any number $c$. Moreover, as $c$ decreases, those solutions--if they exist--must draw further apart. That shows there will be a single solution to the preceding equation, with $z$ less than the mode and $w$ greater than the mode, provided $0 \lt \alpha \lt 1/2$, and that it will be a global minimum. (For $\alpha=1/2$ the interval will reduce to a point. Although any point would do, a mode would be a point of greatest density. For $1/2\lt \alpha \lt 1$ there are no solutions.)
Finally, when the distribution is also symmetric (as in the case of a Normal distribution), then necessarily $z$ and $w$ must be equidistant from the mode, implying $k=1/2$.