[Math] Polynomial Regression/Least Squares

na.numerical-analysisst.statistics

I have a couple questions on the same topic:

1) Does ordinary linear least squares have a generally defined confidence interval. I know that you can determine the confidence interval of each term, $\hat{a}+\hat{b}x$ as follows:

$\hat{a}\pm t_{\alpha/2,n-2}\sqrt{(1/n+\hat{x}^2/S_{xx})MSE}$

$\hat{b}\pm t_{\alpha/2,n-2}\sqrt{MSE/S_{xx}}$

Is there a combination of these two terms that yields an overall confidence interval of the least squares regression?

2) Secondly, if a general confidence interval is defined for the linear regression, can it be generalized into higher degree regressions; $\hat{z_0}+\hat{z_1}x + \hat{z_2}x^2+…+\hat{z_n}x^n$?

I'm programming some regression classes and I'm trying to design a regression interface. I've managed to find confidence interval equations for more complicated regression techniques, but the simplest technique is giving me the most problems.

Best Answer

You seem to be suffering the usual confusion about what "linear regression" means. Nonlinearity in the $x$ values does not mean you're doing nonlinear regression. If you're talking about fitting the sort of polynomial you're talking about above under the simplest conventional assumptions, that's linear regression, notwithstanding the fact that you're fitting a curve rather than a line.

You're also using the lower-case $n$ to denote two different things in the same problem! Confusing at best!

Think about this: $$ Y = X\beta + \varepsilon $$ where $Y$ is an $n\times1$ vector, $X$ is an $n\times k$ matrix with (typically) $k \ll n$, and $\varepsilon$ is an $n\times 1$ random vector of "errors" that is normally distributed with expected value the $n\times 1$ column vector of $0$s and variance $\sigma^2$ times the $n\times n$ identity matrix. You can observe $X$ and $Y$ and you want to estimate $\beta$, which is of course a $k\times 1$ fixed but unobservable vector.

In the case where $Y_i = \beta_0 + \beta_1 x_i + \beta_2 x_i^2 + \cdots + \beta_{k-1} x_i^{k-1} + \varepsilon_i$, we have $$ X= \begin{bmatrix} 1 & x_1 & x_1^2 & \cdots & x_1^{k-1} \\ \vdots & \vdots & \vdots & & \vdots \\ 1 & x_n & x_n^2 & \cdots & x_n^{k-1} \end{bmatrix} $$ Then the least-squares estimate of $\beta$ is $\hat\beta = (X^TX)^{-1}X^T Y$. The vector of fitted values of $Y$ is $\hat Y = HY = X(X^TX)^{-1}X^TY$. The "hat matrix" $H$ is an $n\times n$ symmetrix idempotent matrix of rank $k \ll n$. It's the orthogonal projection onto the column space of $X$. The vector of residuals is $\hat\varepsilon = Y - \hat Y = (I - H)Y$. This matrix $I-H$ represents the complementary orthogonal projection.

Now remember: If $A$ is an $m\times n$ fixed matrix and $B$ is an $\ell\times n$ fixed matrix then $E(AY) = A E(Y)$, $\operatorname{var}(AY) = A(\operatorname{var}(Y))A^T$, and $\operatorname{cov}(AY,BY) = A(\operatorname{cov}(Y,Y)B^T$.

Consequently we have $$ \hat\beta \sim N_k (\beta, \sigma^2 (X^TX)^{-1}), $$ (a $k$-dimensional normal distribution) $$ \hat\varepsilon \sim N_n(0, \sigma^2(I - H)) $$ (an $n$-dimensional normal distribution with a variance of small rank, $k$), and $$ \operatorname{cov}(\hat\beta, \hat\varepsilon) = 0 $$ (a $k\times n$ matrix of zeros).

Therefore the distribution of the sum of squares of observed residuals (not unboservable errors) is given by $$ \frac{\|\hat\varepsilon\|^2}{\sigma^2} = \frac{\|(I-H)Y\|^2}{\sigma^2} \sim \chi^2_{n-k}, $$ a chi-square distribution with $n-k$ degrees of freedom, and this random vector is actually independent of the least-squares estimates $\hat\beta$, which is normally distributed.

Now if you think about the definition of the Student's t-distribution, you can derive various confidence intervals.

Related Question