Solved – Equation for Confidence Interval of Linear Regression

confidence intervalregression

I've done a multivariate linear regression. The results specify each parameter and the 95% confidence interval for each parameter. I did this using Python and StatsModels (not that it matters), and the results are for example:

                 coef    std err          t      P>|t|      [95.0% Conf. Int.]
------------------------------------------------------------------------------
Intercept   4.971e+04   1575.998     31.541      0.000      4.66e+04  5.28e+04
hdd          163.1509     35.301      4.622      0.000        93.350   232.951
cdd          879.7969     76.879     11.444      0.000       727.784  1031.810
occ          177.8679     20.619      8.627      0.000       137.099   218.637

Based on this, the best fitting result is:

y = 4.971e+04  +  163.1509 * hdd  +  879.7969 * cdd  +  177.8679 * occ

My question is, if I were to write an equation for the upper bound and one for the lower bound based on the confidence interval described above, would it be simply:

y_max = 5.28e+04  +  232.951 * hdd  +  1031.810 * cdd  +  218.637 * occ
y_min = 4.66e+04  +   93.350 * hdd  +   727.784 * cdd  +  137.099 * occ

So, do I just take all the coefficients from the 95% confidence section and plug them into the equation?

EDIT:
A little clarification: I'm trying to write the equations that allow me to say, "with 95% probability, the data points lie between equation A and equation B".

Best Answer

There are 2 issues that you need to understand. First is that your equations do not take into account the correlation between the coefficient estimates and the second is the difference between a confidence interval and a prediction interval.

The confidence interval tells where you think the mean response will be for a given set of x-values. Much fewer than 95% of your observations (and future observations) will fall within the confidence bands. What you are asking for is a prediction interval, which tells where a future single value is likely to fall.

There are many resources that discuss prediction and confidence intervals. Regression textbooks will give the formulas to use along with more detailed explanation. One online resource (there are many, this is just one I found to point to) is https://onlinecourses.science.psu.edu/stat501/node/315 which refers to a formula on the previous page (section 7.1) that you may need to click on to understand the full formula.

Related Question