Confidence Interval – Understanding Its Shape for Polynomial Regression (MLR)

confidence intervalregression

I have difficulties to grasp the shape of the confidence interval of a polynomial regression.

Here is an artificial example, $\hat{Y}=a+b\cdot X+c\cdot X^2$.
The left figure depicts the UPV (unscaled prediction variance) and the right graph shows the confidence interval and the (artificial) measured points at X=1.5, X=2 and X=3.

Details of the underlying data:

the data set consists of three data points (1.5; 1), (2; 2.5) and (3; 2.5).
each point was "measured" 10 times and each measured value belongs to $y \pm 0.5$. A MLR with a poynomial model was performed on the 30 resulting points.
the confidence interval was computed with the formulas
$$
UPV=\frac{Var[\hat{y}(x_0)]}{\hat{\sigma}^2}=x_0'(X'X)^{-1}x_0
$$
and
$$
\hat{y}(x_0) – t_{\alpha /2, df(error)}\sqrt{\hat{\sigma}^2\cdot x_0'(X'X)^{-1}x_0}
$$
$$
\leq \mu_{y|x_0} \leq \hat{y}(x_0) + t_{\alpha /2, df(error)}\sqrt{\hat{\sigma}^2\cdot x_0'(X'X)^{-1}x_0} .
$$
(both formulas are taken from Myers, Montgomery, Anderson-Cook, "Response Surface Methodology" fourth edition, page 407 and 34)

$t_{\alpha /2, df(error)}=2$ and $ \hat{\sigma}^2=MSE=SSE/(n-p)\sim0.075 $ .

I am not particularly interested in the absolute values of the confidence interval, but rather in the shape of the UPV which only depends of $x_0'(X'X)^{-1}x_0$.

Figure 1:

the very high predicted variance outside the design space is normal because we are extrapolating
but why is the variance smaller between X=1.5 and X=2 than on the measured points ?
and why does the variance gets wider for values over X=2 but then diminishes after X=2.3 to become again smaller than on the measured point at X=3?

Wouldn't it be logical for the variance to be small on the measured points and big between them ?

Edit: same procedure but with data points [(1.5; 1), (2.25; 2.5), (3; 2.5)] and [(1.5; 1), (2; 2.5), (2.5; 2.2), (3; 2.5)].

Figure 2:

Figure 3:

It is interesting to note, that on figure 1 and 2, the UPV on the Points is exactly equal to 1. This means that the confidence interval will be precisely equal to $ \hat{y} \pm t_{\alpha /2, df(error)}\cdot \sqrt{MSE} $. With an increasing number of points (figure 3), we can get UPV-values on the measured points which are smaller than 1.

Best Answer

The two principal ways of understanding such regression phenomenon are algebraic--by manipulating the Normal equations and formulas for their solution--and geometric. Algebra, as illustrated in the question itself, is good. But there are several useful geometric formulations of regression. In this case, visualizing the $(x,y)$ data in $(x,x^2,y)$ space offers insight that otherwise may be difficult to come by.

We pay the price of needing to look at three-dimensional objects, which is difficult to do on a static screen. (I find endlessly rotating images to be annoying and so will not inflict any of those on you, even though they can be helpful.) Thus, this answer might not appeal to everyone. But those willing to add the third dimension with their imagination will be rewarded. I propose to help you out in this endeavor by means of some carefully chosen graphics.

Let's begin by visualizing the independent variables. In the quadratic regression model

$$y_i = \beta_0 + \beta_1 (x_i) + \beta_2 (x_i^2) + \text{error},\tag{1}$$

the two terms $(x_i)$ and $(x_i^2)$ can vary among observations: they are the independent variables. We can plot all the ordered pairs $(x_i,x_i^2)$ as points in a plane with axes corresponding to $x$ and $x^2.$ It is also revealing to plot all points on the curve of possible ordered pairs $(t,t^2):$

Visualize the responses (dependent variable) in a third dimension by tilting this figure back and using the vertical direction for that dimension. Each response is plotted as a point symbol. These simulated data consist of a stack of ten responses for each of the three $(x,x^2)$ locations shown in the first figure; the possible elevations of each stack are shown with gray vertical lines:

Quadratic regression fits a plane to these points.

(How do we know that? Because for any choice of parameters $(\beta_0,\beta_1,\beta_2),$ the set of points in $(x,x^2,y)$ space that satisfy equation $(1)$ are the zero set of the function $-\beta_1(x)-\beta_2(x^2)+(1)y-\beta_0,$ which defines a plane perpendicular to the vector $(-\beta_1,-\beta_2,1).$ This bit of analytic geometry buys us some quantitative support for the picture, too: because the parameters used in these illustrations are $\beta_1=-55/8$ and $\beta_2=15/2,$ and both are large compared to $1,$ this plane will be nearly vertical and oriented diagonally in the $(x,x^2)$ plane.)

Here is the least-squares plane fitted to these points:

On the plane, which we might suppose to have an equation of the form $y=f(x,x^2),$ I have "lifted" the curve $(t,t^2)$ to the curve $$t\to (t, t^2, f(t,t^2))$$ and drawn that in black.

Let's tilt everything further back so that only the $x$ and $y$ axes are showing, leaving the $x^2$ axis to drop invisibly down from your screen:

You can see how the lifted curve is precisely the desired quadratic regression: it is the locus of all ordered pairs $(x,\hat y)$ where $\hat y$ is the fitted value when the independent variable is set to $x.$

The confidence band for this fitted curve depicts what can happen to the fit when the data points are randomly varied. Without changing the point of view, I have plotted five fitted planes (and their lifted curves) to five independent new sets of data (of which only one is shown):

To help you see this better, I have also made the planes nearly transparent. Evidently the lifted curves tend to have mutual intersections near $x \approx 1.75$ and $x \approx 3.$

Let's look at the same thing by hovering above the three-dimensional plot and looking slightly down and along the diagonal axis of the plane. To help you see how the planes change, I have also compressed the vertical dimension.

The vertical golden fence shows all the points above the $(t,t^2)$ curve so you can see more easily how it lifts up to all five fitted planes. Conceptually, the confidence band is found by varying the data, which causes the fitted planes to vary, which changes the lifted curves, whence they trace out an envelope of possible fitted values at each value of $(x,x^2).$

Now I believe a clear geometric explanation is possible. Because the points of the form $(x_i,x_i^2)$ nearly line up in their plane, all the fitted planes will rotate (and jiggle a tiny bit) around some common line lying above those points. (Let $\mathcal L$ be the projection of that line down to the $(x,x^2)$ plane: it will closely approximate the curve in the first figure.) When those planes are varied, the amount by which the lifted curve changes (vertically) at any given $(x,x^2)$ location will be directly proportional to the distance $(x,x^2)$ lies from $\mathcal L.$

This figure returns to the original planar perspective to display $\mathcal L$ relative to the curve $t\to(t,t^2)$ in the plane of independent variables. The two points on the curve closest to $\mathcal L$ are marked in red. Here, approximately, is where the fitted planes will tend to be closest as the responses vary randomly. Thus, the lifted curves at the corresponding $x$ values (around $1.7$ and $2.9$) will tend to vary least near these points.

Algebraically, finding those "nodal points" is a matter of solving a quadratic equation: thus, at most two of them will exist. We can therefore expect, as a general proposition, that the confidence bands of a quadratic fit to $(x,y)$ data may have up to two places where they come closest together--but no more than that.

This analysis conceptually applies to higher-degree polynomial regression, as well as to multiple regression generally. Although we cannot truly "see" more than three dimensions, the mathematics of linear regression guarantee that the intuition derived from two- and three-dimensional plots of the type shown here remains accurate in higher dimensions.

Best Answer

Related Solutions

Solved – Confidence interval for polynomial linear regression

Solved – Confidence Interval of the mean response for Weighted Least Squares (WLS)

Related Question