Polynomial Regression – Why Polynomial Regression is a Special Case of Multiple Linear Regression?

linear modelmultiple regressionnonlinear regressionpolynomialregression

If polynomial regression models nonlinear relationships, how can it be considered a special case of multiple linear regression?

Wikipedia notes that "Although polynomial regression fits a nonlinear model to the data, as a statistical estimation problem it is linear, in the sense that the regression function $\mathbb{E}(y | x)$ is linear in the unknown parameters that are estimated from the data."

How is polynomial regression linear in the unknown parameters if the parameters are coefficients for terms with order $\ge$ 2?

Best Answer

When you fit a regression model such as $\hat y_i = \hat\beta_0 + \hat\beta_1x_i + \hat\beta_2x^2_i$, the model and the OLS estimator doesn't 'know' that $x^2_i$ is simply the square of $x_i$, it just 'thinks' it's another variable. Of course there is some collinearity, and that gets incorporated into the fit (e.g., the standard errors are larger than they might otherwise be), but lots of pairs of variables can be somewhat collinear without one of them being a function of the other.

We don't recognize that there are really two separate variables in the model, because we know that $x^2_i$ is ultimately the same variable as $x_i$ that we transformed and included in order to capture a curvilinear relationship between $x_i$ and $y_i$. That knowledge of the true nature of $x^2_i$, coupled with our belief that there is a curvilinear relationship between $x_i$ and $y_i$ is what makes it difficult for us to understand the way that it is still linear from the model's perspective. In addition, we visualize $x_i$ and $x^2_i$ together by looking at the marginal projection of the 3D function onto the 2D $x, y$ plane.

If you only have $x_i$ and $x^2_i$, you can try to visualize them in the full 3D space (although it is still rather hard to really see what is going on). If you did look at the fitted function in the full 3D space, you would see that the fitted function is a 2D plane, and moreover that it is a flat plane. As I say, it is hard to see well because the $x_i, x^2_i$ data exist only along a curved line going through that 3D space (that fact is the visual manifestation of their collinearity). We can try to do that here. Imagine this is the fitted model:

x     = seq(from=0, to=10, by=.5)
x2    = x**2
y     = 3 + x - .05*x2
d.mat = data.frame(X1=x, X2=x2, Y=y)

# 2D plot
plot(x, y, pch=1, ylim=c(0,11), col="red", 
     main="Marginal projection onto the 2D X,Y plane")
lines(x, y, col="lightblue")

enter image description here

# 3D plot
library(scatterplot3d)
s = scatterplot3d(x=d.mat$X1, y=d.mat$X2, z=d.mat$Y, color="gray", pch=1, 
              xlab="X1", ylab="X2", zlab="Y", xlim=c(0, 11), ylim=c(0,101), 
              zlim=c(0, 11), type="h", main="In pseudo-3D space")
s$points(x=d.mat$X1, y=d.mat$X2, z=d.mat$Y, col="red", pch=1)
s$plane3d(Intercept=3, x.coef=1, y.coef=-.05, col="lightblue")

enter image description here

It may be easier to see in these images, which are screenshots of a rotated 3D figure made with the same data using the rgl package.

enter image description here

When we say that a model that is "linear in the parameters" really is linear, this isn't just some mathematical sophistry. With $p$ variables, you are fitting a $p$-dimensional hyperplane in a $p\!+\!1$-dimensional hyperspace (in our example a 2D plane in a 3D space). That hyperplane really is 'flat' / 'linear'; it isn't just a metaphor.