What is the algebraic notation to calculate the prediction interval for multiple regression?
It sounds silly, but I am having trouble finding a clear algebraic notation of this.
least squaresmultiple regressionprediction interval
What is the algebraic notation to calculate the prediction interval for multiple regression?
It sounds silly, but I am having trouble finding a clear algebraic notation of this.
Best Answer
Take a regression model with $N$ observations and $k$ regressors: $$\mathbf{y=X\beta+u} \newcommand{\Var}{\rm Var}$$
Given a vector $\mathbf{x_0}$, the predicted value for that observation would be $$E[y \vert \mathbf{x_0}]=\hat y_0 = \mathbf{x_0} \hat \beta.$$ A consistent estimator of the variance of this prediction is $$\hat V_p=s^2 \cdot \mathbf{x_0} \cdot(\mathbf{X'X})^{-1}\mathbf{x'_0},$$ where $$s^2=\frac{\Sigma_{i=1}^{N} \hat u_i^2}{N-k}.$$ The forecast error for a particular $y_0$ is $$\hat e=y_0-\hat y_0=\mathbf{x_0}\beta+u_0-\hat y_0.$$ The zero covariance between $u_0$ and $\hat \beta$ implies that $$\Var[\hat e]=\Var[\hat y_0]+\Var[u_0],$$ and a consistent estimator of that is $$\hat V_f=s^2 \cdot \mathbf{x_0} \cdot(\mathbf{X'X})^{-1}\mathbf{x'_0} + s^2.$$
The $1-\alpha$ $\rm confidence$ interval will be: $$y_0 \pm t_{1-\alpha/2}\cdot \sqrt{\hat V_{p}}.$$ The $1-\alpha$ $\rm prediction$ interval will be wider: $$y_0 \pm t_{1-\alpha/2}\cdot \sqrt{\hat V_{f}}.$$