Solved – Confidence Interval of the mean response for Weighted Least Squares (WLS)

confidence intervalregressionwls

I know that the confidence interval of the mean response for OLS is:

$$
\hat{y}(x_0) – t_{\alpha /2, df(error)}\sqrt{\hat{\sigma}^2\cdot x_0'(X'X)^{-1}x_0}
$$

$$
\leq \mu_{y|x_0} \leq \hat{y}(x_0) + t_{\alpha /2, df(error)}\sqrt{\hat{\sigma}^2\cdot x_0'(X'X)^{-1}x_0} .
$$

(the formula is taken from Myers, Montgomery, Anderson-Cook, "Response Surface Methodology" fourth edition, page 407 and 34)

What would be the confidence interval for WLS ?

Many thanks !

Best Answer

$\newcommand{\e}{\varepsilon}$TL;DR: If you have $$ y = X\beta + \e $$ with $\e \sim \mathcal N(0, \sigma^2 \Omega)$ for $\Omega$ known, then it will be $$ x_0^T\hat\beta \pm t_{\alpha/2, n-p} \sqrt{\hat\sigma^2 x_0^T(X^T\Omega^{-1}X)^{-1}x_0} $$ with $\hat\beta = (X^T\Omega^{-1}X)^{-1}X^T\Omega^{-1}y$ and $$ \hat\sigma^2 = \frac{y^T\Omega^{-1}(I-H)y}{n-p}. $$

This doesn't require $\Omega$ to be diagonal but it is essential that it is known.


Here's how to derive this.

The MLE of $\beta$ is $$ \hat\beta = (X^T\Omega^{-1}X)^{-1}X^T\Omega^{-1}y $$ (this is just generalized least squares).

From this it follows that $\hat\beta$ is still Gaussian and $$ \text E(\hat\beta) = (X^T\Omega^{-1}X)^{-1}X^T\Omega^{-1}\text E(y) = \beta $$ and $$ \text{Var}(\beta) = \sigma^2 (X^T\Omega^{-1}X)^{-1}X^T\Omega^{-1}\Omega \Omega^{-1}X(X^T\Omega^{-1}X)^{-1} \\ = \sigma^2 (X^T\Omega^{-1}X)^{-1} $$ so all together $$ \hat\beta \sim \mathcal N(\beta, \sigma^2 (X^T\Omega^{-1}X)^{-1}). $$


In order to get the confidence interval for the (conditional) mean $\mu_0$ at a particular point $x_0$, we need to determine the distribution of $$ \frac{\hat y_0 - \mu_0}{\hat {\text{SD}}(\hat y_0)} $$ so we can get the appropriate quantiles.

We have $$ \hat y_0 = x_0^T\hat\beta \sim \mathcal N(x_0^T\beta, \sigma^2 x_0^T(X^T\Omega^{-1}X)^{-1} x_0) $$

so this is $$ \frac{x_0^T\hat\beta - x_0^T\beta}{\sqrt{\hat \sigma^2 x_0^T(X^T\Omega^{-1}X)^{-1} x_0}} = \frac{x_0^T(\hat\beta - \beta) / \sqrt{x_0^T(X^T\Omega^{-1}X)^{-1} x_0}}{\sqrt{\hat \sigma^2}}. $$ The numerator of that is still Gaussian, and $$ \frac{x_0^T(\hat\beta - \beta)}{\sqrt{x_0^T(X^T\Omega^{-1}X)^{-1} x_0}} \sim \mathcal N\left(0, \sigma^2\right). $$ Recall that a t distribution with $d$ degrees of freedom is defined to be $$ \frac{\mathcal N(0,1)}{\sqrt{\chi^2_d / d}} $$ where the two distributions are independent.

We've pretty much got the numerator now so the remaining questions are (1) what's the distribution of $\hat\sigma^2$, and (2) if they are independent or not.

We will want to set $$ \hat \sigma^2 = \frac{y^T\Omega^{-1}(I-H)y}{n-p} $$ where $$ H = X(X^T\Omega^{-1}X)^{-1}X^T\Omega^{-1} $$ is the GLS hat matrix.

That $\Omega^{-1}$ looks a little weird in $\hat\sigma^2$ so let's check that this is unbiased. Using a standard result about the expected value of a quadratic form (and noting that actually $\Omega^{-1}(I-H)$ is symmetric), $$ \text E(y^T\Omega^{-1}(I-H)y) = \text{tr}\left(\Omega^{-1}(I-H) \text{Var}(y)\right) + (\text E y)^T \Omega^{-1}(I-H) (\text E y) \\ = \sigma^2 \text{tr} \left(\Omega^{-1}(I-H)\Omega\right) + \beta^TX^T\Omega^{-1}(I-H)X\beta. $$ Using the cyclicity of the trace, that first term becomes $\sigma^2(n-p)$. For the second term, $$ X^T\Omega^{-1}(I-H)X = X^T\Omega^{-1}X - X^T\Omega^{-1}X(X^T\Omega^{-1}X)^{-1}X^T\Omega^{-1}X = 0 $$ so $\hat\sigma^2$ is indeed unbiased.

Now for its distribution, note that from $y = X\beta + \e$ we could left-multiply through by $\Omega^{-1/2}$ to get $$ \underbrace{\Omega^{-1/2}y}_{:= z} = \underbrace{\Omega^{-1/2}X}_{:= Z}\,\beta + \underbrace{\Omega^{-1/2}\e}_{:= \eta} $$ so this is like a homoscedastic linear model $z = Z\beta + \eta$ with $\eta \sim \mathcal N(0,\sigma^2 I)$. In this case we'd get $H_Z = Z(Z^TZ)^{-1}Z^T$ and from the usual theory (see Cochran's theorem) we get $$ z^T(I-H_Z)z / \sigma^2 \sim \chi^2_{n-p}. $$

Note that $H_Z = \Omega^{-1/2}X(X^T\Omega^{-1}X)^{-1}X^T\Omega^{-1/2}$ so $$ z^T(I-H_Z)z = y^T\Omega^{-1/2}(I - H_Z)\Omega^{-1/2}y \\ = y^T \left[\Omega^{-1} - \Omega^{-1}X(X^T\Omega^{-1}X)^{-1}X^T\Omega^{-1}\right]y \\ = y^T\Omega^{-1}(I-H)y = (n-p)\hat\sigma^2 $$ so that shows both how we'd have come up with $\hat\sigma^2$ in the first place, and that $$ \frac{(n-p)\hat\sigma^2}{\sigma^2} \sim \chi^2_{n-p}. $$

This means that $$ \frac{x_0^T\hat\beta - x_0^T\beta}{\sqrt{\hat \sigma^2 x_0^T(X^T\Omega^{-1}X)^{-1} x_0}} = \frac{x_0^T(\hat\beta - \beta) / \sqrt{\sigma^2 x_0^T(X^T\Omega^{-1}X)^{-1}x_0}}{\sqrt{\hat\sigma^2 / \sigma^2}} $$ is exactly of the form $\mathcal N(0,1) / \sqrt{\chi^2_d / d}$.

Independence comes from the unweighted theory, as we know $$ z^T(I-H_Z)z \perp \hat\beta_Z $$ but $$ \hat\beta_Z = (Z^TZ)^{-1}Z^Tz = (X^T\Omega^{-1}X)^{-1}X^T\Omega^{-1}y = \hat\beta $$ and $z^T(I-H_Z)z = (n-p)\hat\sigma^2$ so we don't need to do any extra work.

Thus with much ado we've proved that $$ \frac{x_0^T\hat\beta - x_0^T\beta}{\sqrt{\hat \sigma^2 x_0^T(X^T\Omega^{-1}X)^{-1} x_0}} \sim \mathcal t_{n-p} $$ so you can just use the usual $t_{\alpha/2, n-p}$ quantiles.

Related Question