When performing linear regression, why might you choose to transform one of your predictor variables over using polynomial regression? Essentially what are the advantages (if any) of performing a transformation over using polynomial regression?
As an example I'm using the Auto dataset provided by ISLR2 package:
require(ISLR2)
df <- Auto
df$horsepower <- log(Auto$horsepower, 2)
model.tran <- lm(mpg ~ horsepower, df)
model.poly <- lm(mpg ~ poly(horsepower, 2), Auto)
summary(model.tran)
summary(model.poly)
ggplot(Auto, aes(horsepower, mpg))+
geom_point()+
geom_line(aes(y=predict(model.tran, df)), col="blue")+
ggtitle("Log Transformed")
ggplot(Auto, aes(horsepower, mpg))+
geom_point()+
geom_line(aes(y=predict(model.poly, Auto)), col="blue")+
ggtitle("Polynomial 2 degrees")
output:
Call:
lm(formula = mpg ~ horsepower + horsepower, data = df)
Residuals:
Min 1Q Median 3Q Max
-14.2299 -2.7818 -0.2322 2.6661 15.4695
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 108.6997 3.0496 35.64 <2e-16 ***
horsepower -12.8802 0.4595 -28.03 <2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 4.501 on 390 degrees of freedom
Multiple R-squared: 0.6683, Adjusted R-squared: 0.6675
F-statistic: 785.9 on 1 and 390 DF, p-value: < 2.2e-16
Call:
lm(formula = mpg ~ poly(horsepower, 2), data = Auto)
Residuals:
Min 1Q Median 3Q Max
-14.7135 -2.5943 -0.0859 2.2868 15.8961
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 23.4459 0.2209 106.13 <2e-16 ***
poly(horsepower, 2)1 -120.1377 4.3739 -27.47 <2e-16 ***
poly(horsepower, 2)2 44.0895 4.3739 10.08 <2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 4.374 on 389 degrees of freedom
Multiple R-squared: 0.6876, Adjusted R-squared: 0.686
F-statistic: 428 on 2 and 389 DF, p-value: < 2.2e-16
As can be seen, the R squared value is higher in the polynomial model. The residual standard error is lower. And residual vs fitted plots indicate residuals variance is slightly more constant in the polynomial regression.
This Answer here https://stats.stackexchange.com/a/287475/353359 mentions implications for the model outside of the x-range and variation from the model. Is this simply a risk of overfitting?
While I have included an example, my question is more general. When would you want to use transformations if polynomial regressions seems like a catch-all for non-linearity.
Best Answer
In theory, the implicit assumption in regression is that you know the shape of the function---linear, quadratic, logarithmic etc.---and just need to find its parameters by fitting it to the data. In practice, of course, this is seldom the case. You can fit an infinite number of functions and obtain marvellous results on your training data, but such models will likely perform poorly in the real world.
So, when choosing the model, nothing beats domain knowledge. In case of your dataset, an automotive engineer or a physicist might be in the position to suggest a realistic model. But, even with a cursory knowledge of physics we might be able to exclude many candidates. For example:
etc.
The first two points obviously speak against a polynomial model, including a linear one ("linear" refering to predictors). The third speaks against $1/x$, which otherwise would perhaps seem plausible.
Among the infinite number of the functions passing the above plausibility check, the exponential decrease is probably the simplest model you can use.