Solved – Confidence interval for polynomial linear regression

confidence intervalregression

I have a model which is not linear but rather polynomial, and I have to estimate the parameters by giving a 95% confidence interval.
There are plenty of formulas for regression of the type $Y = \beta_0 + \beta_1 X$, but do they apply in my case (where $Y = \beta_1 X + \beta_2 X^2$)?

Of course, R gives me a pretty output:

Call:
lm(formula = dN ~ 0 + I(N) + I(N^2))

Residuals:
   1        2        3        4        5        6        7 
 0.02456 -0.10512 -0.12136  0.01848  0.24056 -0.11465  0.02646 

Coefficients:
         Estimate Std. Error t value Pr(>|t|)    
I(N)    2.977e-02  6.596e-04   45.14 1.01e-07 ***
I(N^2) -4.440e-05  1.770e-06  -25.08 1.88e-06 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.1403 on 5 degrees of freedom
Multiple R-squared:  0.9992,    Adjusted R-squared:  0.9989 
F-statistic:  3173 on 2 and 5 DF,  p-value: 1.739e-08

I have read on some PDF file (page 13) that one can simply get the confidence intervals by taking the standard error (given by R): $\hat{\beta_1} \pm t_{\alpha/2} \times Std. Error$. Does it always hold?

In the same way, are the confidence intervals for the model prediction the same?

Thank you in advance for any clarification.

Best Answer

Polynomial regression is in effect multiple linear regression: consider $X_1=X$ and $X_2=X^2$ -- then $E(Y) = \beta_1 X + \beta_2 X^2$ is the same as $E(Y) = \beta_1 X_1 + \beta_2 X_2$.

As such, methods for constructing confidence intervals for parameters (and for the mean in multiple regression) carry over directly to the polynomial case. Most regression packages will compute this for you. Yes, it can be done using the formula you suggest (if the assumptions needed for the t-interval to apply hold), and the right d.f. are used for the $t$ (the residual d.f. - which in R is available from the summary output).

The R function confint can be used to construct confidence intervals for parameters from a regression model. See ?confint.

In the case of a confidence interval for the conditional mean, let $X$ be the matrix of predictors, whether for polynomial regression or any other multiple regression model; let the estimated variance of the mean at $x_i=(x_{1i},x_{2i},...,x_{pi})$ be $v_i=\hat{\sigma}^2x_i(X'X)^{-1}x_i'$ and let $s_i=\sqrt v_i$ be the corresponding standard error. Let the upper $\alpha/2$ $t$ critical value for $n-p-1$ df be $t$. Then the pointwise confidence interval for the mean at $x_i$ is $\hat{y}_i\pm t\cdot s$.

Also, the R function predict can be used to construct CIs for E(Y|X) - see ?predict.lm.

[At least when doing polynomial regression with an intercept, it makes sense to use orthogonal polynomials but if the spread of $X$ is large compared to the mean, and the degree is low (such as quadratic), it won't be so critical (I tend to do so anyway, because it's easier to interpret the linear and quadratic).]

Related Question