Using Hotelling’s T-statistic to find an elliptic confidence set

confidence intervalnormal distributionstatistics

The problem: We have samples of sizes ${n_1} = 25,{n_2} = 15,{n_3} = 30$ drawn independently from $N\left( {{\mu _i},{\sigma ^2}} \right),i = 1,2,3$ (normal distributions with same variance). We have ${\overline x _1} = 10.5,{\overline x _2} = 14,{\overline x _1} = 12,s_1^2 = 2.5,s_2^2 = 3,s_3^2 = 2.7$ (unbiased variance estimators).

Find 95% elliptic confidence set for $\left( {{z_1},{z_2}} \right) = \left( {{\mu _1} + {\mu _2} – 2{\mu _3},{\mu _1} – {\mu _2}} \right)$ by using Hotelling's distribution.

My attempt: We have $$\left( {{X_{1i}},{X_{2j}},{X_{3k}}} \right)\sim N\left( {\left( {\begin{array}{*{20}{c}}
{{\mu _1}} \\
{{\mu _2}} \\
{{\mu _3}}
\end{array}} \right),\left[ {\begin{array}{*{20}{c}}
{{\sigma ^2}}&0&0 \\
0&{{\sigma ^2}}&0 \\
0&0&{{\sigma ^2}}
\end{array}} \right]} \right) \Rightarrow \left( {{{\overline X }_1},{{\overline X }_2},{{\overline X }_3}} \right)\sim N\left( {\underbrace {\left( {\begin{array}{*{20}{c}}
{{\mu _1}} \\
{{\mu _2}} \\
{{\mu _3}}
\end{array}} \right)}_\mu ,\underbrace {\left[ {\begin{array}{*{20}{c}}
{{\sigma ^2}/{n_1}}&0&0 \\
0&{{\sigma ^2}/{n_2}}&0 \\
0&0&{{\sigma ^2}/{n_3}}
\end{array}} \right]}_\Sigma } \right)$$

So $$\left( {{Z_1},{Z_2}} \right) = \left( {{{\overline X }_1} + {{\overline X }_2} – 2{{\overline X }_3},{{\overline X }_1} – {{\overline X }_2}} \right) = \underbrace {\left( {\begin{array}{*{20}{c}}
1&1&{ – 2} \\
1&{ – 1}&0
\end{array}} \right)}_B\left( {\begin{array}{*{20}{c}}
{{{\overline X }_1}} \\
{{{\overline X }_2}} \\
{{{\overline X }_3}}
\end{array}} \right)\sim N\left( {B\mu ,B\Sigma B'} \right)$$

and
$$\left( {{Z_1},{Z_2}} \right)\sim N\left( {\left( {\begin{array}{*{20}{c}}
{{\mu _1} + {\mu _2} – 2{\mu _3}} \\
{{\mu _1} – {\mu _2}}
\end{array}} \right),\left( {\begin{array}{*{20}{c}}
{{\sigma ^2}\left( {\frac{1}{{{n_1}}} + \frac{1}{{{n_2}}} + \frac{4}{{{n_3}}}} \right)}&{{\sigma ^2}\left( {\frac{1}{{{n_1}}} – \frac{1}{{{n_2}}}} \right)} \\
{{\sigma ^2}\left( {\frac{1}{{{n_1}}} – \frac{1}{{{n_2}}}} \right)}&{{\sigma ^2}\left( {\frac{1}{{{n_1}}} + \frac{1}{{{n_2}}}} \right)}
\end{array}} \right)} \right)$$

The only estimator for ${{\sigma ^2}}$ that comes to mind is the pooled variance ${\widehat \sigma ^2} = s_p^2 = \frac{{\sum\limits_{i = 1}^3 {\left( {{n_i} – 1} \right)s_i^2} }}{{\sum\limits_{i = 1}^3 {\left( {{n_i} – 1} \right)} }}$, but I don't see how to get from there to Hotelling's distribution. I'm assuming that $S = s_p^2\left( {\begin{array}{*{20}{c}}
{\frac{1}{{{n_1}}} + \frac{1}{{{n_2}}} + \frac{4}{{{n_3}}}}&{\frac{1}{{{n_1}}} – \frac{1}{{{n_2}}}} \\
{\frac{1}{{{n_1}}} – \frac{1}{{{n_2}}}}&{\frac{1}{{{n_1}}} + \frac{1}{{{n_2}}}}
\end{array}} \right)$
does not follow a Wishart distribution.

How would I find the required elliptic confidence set for the linear combination of expectations with unknown (and common) variance from those independent normally distributed samples?

EDIT: We also know that $\sum\limits_{i = 1}^3 {\left( {{n_i} – 1} \right)\frac{{S_i^2}}{{{\sigma ^2}}}} \sim {\chi ^2}\left( {\sum\limits_{i = 1}^3 {\left( {{n_i} – 1} \right)} } \right)$, which further makes me suspect that Hotelling's distribution will not play a role in the solution. Still, I don't see which test statistic to employ (I'm guessing ${\left( {\left( {{z_1},{z_2}} \right) – B\mu } \right)^\prime }{S^{ – 1}}\left( {\left( {{z_1},{z_2}} \right) – B\mu } \right)$) and which distribution would it follow.

EDIT2: Also see here

Best Answer

We can approach this problem from the perspective of the classical linear model: $$ \begin{align} X &\sim \mathcal{N}(\mathcal{X}\beta,\sigma^2I_n),\\ X &= (X_{11},\ldots,X_{1n_1},X_{21},\ldots,X_{2n_2},X_{31},\ldots,X_{3n_3})^\mathsf{T},\\ \mathcal{X}&=\begin{pmatrix} \mathbf{1}_{n_1} & \mathbf{0}_{n_1} & \mathbf{0}_{n_1}\\ \mathbf{0}_{n_2} & \mathbf{1}_{n_2} & \mathbf{0}_{n_2}\\ \mathbf{0}_{n_3} & \mathbf{0}_{n_3} & \mathbf{1}_{n_3} \end{pmatrix},\\ \beta &= \begin{pmatrix} \mu_1\\ \mu_2\\ \mu_3 \end{pmatrix},\\ n &= n_1+n_2+n_3. \end{align} $$ We then know that $$ \begin{align} &(\mathcal{X}^\mathsf{T}\mathcal{X})^{-1}\mathcal{X}^\mathsf{T}X \mathrel{=:} \hat{\beta} \sim \mathcal{N}(\beta,\sigma^2(\mathcal{X}^\mathsf{T}\mathcal{X})^{-1}),\\ &\frac{(B \hat{\beta} - b)^\mathsf{T}(B( \mathcal{X}^\mathsf{T}\mathcal{X})^{-1}B^\mathsf{T})^{-1} (B \hat{\beta} - b)/m}{\widehat{\sigma}^2}\mathrel{=:}F\overset{H_0}{\sim} F_{m,n-3},\\ &\widehat{\sigma}^2=\frac{1}{n-3}(X-\mathcal{X}\hat{\beta})^\mathsf{T}(X-\mathcal{X}\hat{\beta}) \end{align} $$ under a null hypothesis of the form $$ H_0:B\beta=b\in \mathbb{R}^{m}, B\in \mathbb{R}^{m\times3},\text{rank}(B)=m. $$ Therefore, a $((1-\alpha)\cdot100)\%$ confidence region for $B\beta$ is given by $$ \left\{\tilde{b}\in\mathbb{R}^m:(B \hat{\beta} - \tilde{b})^\mathsf{T}(B( \mathcal{X}^\mathsf{T}\mathcal{X})^{-1}B^\mathsf{T})^{-1} (B \hat{\beta} - \tilde{b})\leq m \widehat{\sigma}^2F_{m,n-3}^{1-\alpha}\right\}, $$ where $F_{m,n-3}^{1-\alpha}$ is the $(1-\alpha)$-quantile of the $F$-distribution with $m$ and $n-3$ degrees of freedom.

In your case, we have $\alpha=0.05,m=2,n=70,B\beta=(\mu_1 + \mu_2 - 2\mu_3, \mu_1 - \mu_2)^\mathsf{T}$, and you've calculated $$ B( \mathcal{X}^\mathsf{T}\mathcal{X})^{-1}B^\mathsf{T}= \begin{pmatrix} \frac{1}{n_1} + \frac{1}{n_2} + \frac{4}{n_3} & \frac{1}{n_1} - \frac{1}{n_2}\\ \frac{1}{n_1} - \frac{1}{n_2} & \frac{1}{n_1} + \frac{1}{n_2} \end{pmatrix},\\ B\hat{\beta}= \begin{pmatrix} \overline{X}_1 + \overline{X}_2 - 2\overline{X}_3\\ \overline{X}_1 - \overline{X}_2 \end{pmatrix}, $$ and $\widehat{\sigma}^2=s^2_p$ already.

We could equivalently construct the confidence region based on Hotelling's T-squared distribution since $$ F \sim F_{m,n-3} \iff \frac{m(m+n-4)}{n-3}F \sim T^2_{m,m+n-4}. $$

Related Question