The Matrix form of the prediction interval for Weighted Least Squares Regression

matrixprediction intervalweighted-regression

For ordinary least squares, I know the confidence interval is
$\hat{Y} ± t\cdot\sqrt{\hat{\sigma}^2 * x_{new}(X^{\top}X)^{-1}x_{new}^{\top}}$ and $X$ is the design matrix, n is the number of observations, and p is the number of parameters in the model, and:

$\hat{\sigma}^2 = \frac{Y^{\top} \left(I-X(X^{\top}X)^{-1}X^{\top} \right)Y}{n-p} $

My design Matrix has observations in rows.

The prediction interval is similarly defined as:
$\hat{Y} ± t\cdot\hat{\sigma} \cdot \sqrt{ x_{new}(X^{\top}X)^{-1}x_{new}^{\top} + 1/q}$ where q is the number of replicate measurements of the future sample.

For weighted least squares, the confidence interval is:
$\hat{Y} ± t\cdot\sqrt{\hat{\sigma}^2\cdot x_{new}(X^{\top}WX)^{-1}x_{new}^{\top}}$

$\hat{\sigma}^2 = \frac{Y^{\top}W\left(I-X(X^{\top}WX)^{-1}X^{\top}W \right)Y}{n-p} $

What is the equation for the weighted prediction interval. I would guess that instead of 1/q, the 1 would be dependent on the overall scale of the weight matrix in some way, or the weight matrix should be scaled such that the squared sum of the elements is one, or something like that.

Doing some tests, the value of the confidence interval does not appear to change if the weight matrix is multiplied by a scalar, but If the prediction interval is calculated as:
$\hat{Y} ± t*\hat{\sigma} \sqrt{ x_{new}(X^{\top}WX)^{-1}x_{new}^{\top} + 1/q}$, then scaling the weights does affect the interval.

Best Answer

Trying to match your notation and what you have written already, I'll assume that $Y=X\beta+\varepsilon,\varepsilon\sim\mathcal{N}\left(0,\sigma^{2}W^{-1}\right)$ for a known positive-definite and diagonal matrix $W$ and that $\hat{\beta}$ is the weighted least squares estimator $\left(X^{\top}WX\right)^{-1}X^{\top}WY$.

This gives \begin{align*} \hat{\beta} & \sim\mathcal{N}\left(\beta,\sigma^{2}\left(X^{\top}WX\right)^{-1}\right),\\ x_{\mathrm{new}}\hat{\beta} & \sim\mathcal{N}\left(x_{\mathrm{new}}\beta,\sigma^{2}x_{\mathrm{new}}\left(X^{\top}WX\right)^{-1}x_{\mathrm{new}}^{\top}\right),\\ Y_{\mathrm{new}}-x_{\mathrm{new}}\hat{\beta} & \sim\mathcal{N}\left(0,\sigma^{2}w_{\mathrm{new}}^{-1}+\sigma^{2}x_{\mathrm{new}}\left(X^{\top}WX\right)^{-1}x_{\mathrm{new}}^{\top}\right), \end{align*} where $w_{\mathrm{new}}^{-1}$ is the diagonal entry of $W^{-1}$ corrsponding to the future observation $Y_{\mathrm{new}}$.
Standardizing and replacing $\sigma^{2}$ with the unbiased estimator $\hat{\sigma}^{2}=\left(Y-X\hat{\beta}\right)^{\top}W\left(Y-X\hat{\beta}\right)/\left(n-p\right)$ yields $$ \frac{Y_{\mathrm{new}}-x_{\mathrm{new}}\hat{\beta}}{\hat{\sigma}\sqrt{w_{\mathrm{new}}^{-1}+x_{\mathrm{new}}\left(X^{\top}WX\right)^{-1}x_{\mathrm{new}}^{\top}}}\sim t_{n-p} $$ and thus the prediction interval $$ x_{\mathrm{new}}\hat{\beta}\mp t_{n-p}^{1-\alpha/2}\hat{\sigma}\sqrt{w_{\mathrm{new}}^{-1}+x_{\mathrm{new}}\left(X^{\top}WX\right)^{-1}x_{\mathrm{new}}^{\top}}, $$ where $t_{n-p}^{1-\alpha/2}$ is the $\left(1-\alpha/2\right)$ quantile of the $t$-distribution with $n-p$ degrees of freedom.
This prediction interval contains the future observation $Y_{\mathrm{new}}$ with probability $1-\alpha$. In practice however, it can only be calculated in this way if $w_{\mathrm{new}}^{-1}$ has a known functional relationship with $x_{\mathrm{new}}$.

Related Question