Solved – Raw or orthogonal polynomial regression

polynomialrregression

I want to regress a variable $y$ onto $x,x^2,\ldots,x^5$. Should I do this using raw or orthogonal polynomials? I looked at the question on the site that deals with these, but I don't really understand what's the difference between using them.

Why can't I just do a "normal" regression to get the coefficients $\beta_i$ of $y=\sum_{i=0}^5 \beta_i x^i$ (along with p-values and all the other nice stuff) and instead have to worry whether using raw or orthogonal polynomials? This choice seems to me to be outside the scope of what I want to do.

In the stat book I'm currently reading (ISLR by Tibshirani et al) these things weren't mentioned. Actually, they were downplayed in a way.
The reason is, AFAIK, that in the lm() function in R, using y ~ poly(x, 2) amounts to using orthogonal polynomials and using y ~ x + I(x^2) amounts to using raw ones. But on pp. 116 the authors say that we use the first option because the latter is "cumbersome" which leaves no indication that these commands actually do two completely different things (and have different outputs as a consequence).
(third question) Why would the authors of ISLR confuse their readers like that?

Best Answer

I feel like several of these answers miss the point. Haitao's answer addresses the computational problems with fitting raw polynomials, but it's clear that OP is asking about the statistical differences between the two approaches. That is, if we had a perfect computer that could represent all values exactly, why would we prefer one approach over the other?

user5957401 argues that orthogonal polynomials reduce the collinearity among the polynomial functions, which makes their estimation more stable. I agree with Jake Westfall's critique; the coefficients in orthogonal polynomials represent completely different quantities from the coefficients on raw polynomials. The model-implied dose-response function, $R^2$, MSE, predicted values, and the standard errors of the predicted values will all be identical regardless of whether you use orthogonal or raw polynomials. The coefficient on $X$ in a raw polynomial regression of order 2 has the interpretation of "the instantaneous change in $Y$ when $X=0$." If you performed a marginal effects procedure on the orthogonal polynomial where $X=0$, you would get exactly the same slope and standard error, even though the coefficient and standard error on the first-order term in the orthogonal polynomial regression is completely different from its value in the raw polynomial regression. That is, when trying to get the same quantities from both regressions (i.e., quantities that can be interpreted the same way), the estimates and standard errors will be identical. Using orthogonal polynomials doesn't mean you magically have more certainty of the slope of $X$ at any given point. The stability of the models is identical. See below:

data("iris")

#Raw:
fit.raw <- lm(Petal.Length ~ Petal.Width + I(Petal.Width^2) +
                  I(Petal.Width^3), data = iris)
summary(fit.raw)

#> Coefficients:
#>                  Estimate Std. Error t value Pr(>|t|)    
#> (Intercept)        1.1034     0.1304   8.464 2.50e-14 ***
#> Petal.Width        1.1527     0.5836   1.975  0.05013 .  
#> I(Petal.Width^2)   1.7100     0.5487   3.116  0.00221 ** 
#> I(Petal.Width^3)  -0.5788     0.1408  -4.110 6.57e-05 ***
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 
#> Residual standard error: 0.3898 on 146 degrees of freedom
#> Multiple R-squared:  0.9522, Adjusted R-squared:  0.9512 
#> F-statistic: 969.9 on 3 and 146 DF,  p-value: < 2.2e-16

#Orthogonal
fit.orth <- lm(Petal.Length ~ stats::poly(Petal.Width, 3), data = iris)

#Marginal effect of X at X=0 from orthogonal model
library(margins)
summary(margins(fit.orth, variables = "Petal.Width", 
                at = data.frame(Petal.Width = 0)))
#> Warning in check_values(data, at): A 'at' value for 'Petal.Width' is
#> outside observed data range (0.1,2.5)!
#>       factor Petal.Width    AME     SE      z      p  lower  upper
#>  Petal.Width      0.0000 1.1527 0.5836 1.9752 0.0482 0.0089 2.2965

Created on 2019-10-25 by the reprex package (v0.3.0)

The marginal effect of Petal.Width at 0 from the orthogonal fit and its standard error are exactly equal to those from the raw polynomial fit (i.e., 1.1527. Using orthogonal polynomials doesn't improve the precision of estimates of the same quantity between the two models.

The key is the following: using orthogonal polynomials allows you to isolate the contribution of each term to explaining variance in the outcome, e.g., as measured by the squared semipartial correlation. If you fit an orthogonal polynomial of order 3, the squared semipartial correlation for each term represents the variance in the outcome explained by that term in the model. So, if you wanted to answer "How much of the variance in $Y$ is explained the linear component of $X$?" you could fit an orthogonal polynomial regression, and the squared semipartial correlation on the linear term would represent this quantity. This is not so with raw polynomials. If you fit a raw polynomial model of the same order, the squared partial correlation on the linear term does not represent the proportion of variance in $Y$ explained by the linear component of $X$. See below.

library(jtools)
data("iris")

fit.raw3 <- lm(Petal.Length ~ Petal.Width + I(Petal.Width^2) +
                  I(Petal.Width^3), data = iris)
fit.raw1 <- lm(Petal.Length ~ Petal.Width, data = iris)

round(summ(fit.raw3, part.corr = T)$coef, 3)
#>                    Est.  S.E. t val.     p partial.r part.r
#> (Intercept)       1.103 0.130  8.464 0.000        NA     NA
#> Petal.Width       1.153 0.584  1.975 0.050     0.161  0.036
#> I(Petal.Width^2)  1.710 0.549  3.116 0.002     0.250  0.056
#> I(Petal.Width^3) -0.579 0.141 -4.110 0.000    -0.322 -0.074

round(summ(fit.raw1, part.corr = T)$coef, 3)
#>              Est.  S.E. t val. p partial.r part.r
#> (Intercept) 1.084 0.073 14.850 0        NA     NA
#> Petal.Width 2.230 0.051 43.387 0     0.963  0.963

fit.orth3 <- lm(Petal.Length ~ stats::poly(Petal.Width, 3), 
               data = iris)
fit.orth1 <- lm(Petal.Length ~ stats::poly(Petal.Width, 3)[,1], 
               data = iris)

round(summ(fit.orth3, part.corr = T)$coef, 3)
#>                                Est.  S.E.  t val. p partial.r part.r
#> (Intercept)                   3.758 0.032 118.071 0        NA     NA
#> stats::poly(Petal.Width, 3)1 20.748 0.390  53.225 0     0.975  0.963
#> stats::poly(Petal.Width, 3)2 -3.015 0.390  -7.735 0    -0.539 -0.140
#> stats::poly(Petal.Width, 3)3 -1.602 0.390  -4.110 0    -0.322 -0.074

round(summ(fit.orth1, part.corr = T)$coef, 3)
#>                                    Est.  S.E. t val. p partial.r part.r
#> (Intercept)                       3.758 0.039 96.247 0        NA     NA
#> stats::poly(Petal.Width, 3)[, 1] 20.748 0.478 43.387 0     0.963  0.963

Created on 2019-10-25 by the reprex package (v0.3.0)

The squared semipartial correlations for the raw polynomials when the polynomial of order 3 is fit are $0.001$, $0.003$, and $0.005$. When only the linear term is fit, the squared semipartial correlation is $0.927$. The squared semipartial correlations for the orthogonal polynomials when the polynomial of order 3 is fit are $0.927$, $0.020$, and $0.005$. When only the linear term is fit, the squared semipartial correlation is still $0.927$. From the orthogonal polynomial model but not the raw polynomial model, we know that most of the variance explained in the outcome is due to the linear term, with very little coming from the square term and even less from the cubic term. The raw polynomial values don't tell that story.

Now, if you want this interpretational benefit over the interpretational benefit of actually being able to understand the coefficients of the model, then you should use orthogonal polynomials. If you would prefer to look at the coefficients and know exactly what they mean (though I doubt one typically does), then you should use the raw polynomials. If you don't care (i.e., you only want to control for confounding or generate predicted values), then it truly doesn't matter; both forms carry the same information with respect to those goals. I would also argue that orthogonal polynomials should be preferred in regularization (e.g., lasso), because removing higher-order terms doesn't affect the coefficients of the lower order terms, which is not true with raw polynomials, and regularization techniques often care about the size of each coefficient.