Solved – Raw or orthogonal polynomial regression

polynomialrregression

I want to regress a variable $y$ onto $x,x^2,\ldots,x^5$. Should I do this using raw or orthogonal polynomials? I looked at the question on the site that deals with these, but I don't really understand what's the difference between using them.

Why can't I just do a "normal" regression to get the coefficients $\beta_i$ of $y=\sum_{i=0}^5 \beta_i x^i$ (along with p-values and all the other nice stuff) and instead have to worry whether using raw or orthogonal polynomials? This choice seems to me to be outside the scope of what I want to do.

In the stat book I'm currently reading (ISLR by Tibshirani et al) these things weren't mentioned. Actually, they were downplayed in a way.
The reason is, AFAIK, that in the lm() function in R, using y ~ poly(x, 2) amounts to using orthogonal polynomials and using y ~ x + I(x^2) amounts to using raw ones. But on pp. 116 the authors say that we use the first option because the latter is "cumbersome" which leaves no indication that these commands actually do two completely different things (and have different outputs as a consequence).
(third question) Why would the authors of ISLR confuse their readers like that?

Best Answer

I feel like several of these answers miss the point. Haitao's answer addresses the computational problems with fitting raw polynomials, but it's clear that OP is asking about the statistical differences between the two approaches. That is, if we had a perfect computer that could represent all values exactly, why would we prefer one approach over the other?

user5957401 argues that orthogonal polynomials reduce the collinearity among the polynomial functions, which makes their estimation more stable. I agree with Jake Westfall's critique; the coefficients in orthogonal polynomials represent completely different quantities from the coefficients on raw polynomials. The model-implied dose-response function, $R^2$, MSE, predicted values, and the standard errors of the predicted values will all be identical regardless of whether you use orthogonal or raw polynomials. The coefficient on $X$ in a raw polynomial regression of order 2 has the interpretation of "the instantaneous change in $Y$ when $X=0$." If you performed a marginal effects procedure on the orthogonal polynomial where $X=0$, you would get exactly the same slope and standard error, even though the coefficient and standard error on the first-order term in the orthogonal polynomial regression is completely different from its value in the raw polynomial regression. That is, when trying to get the same quantities from both regressions (i.e., quantities that can be interpreted the same way), the estimates and standard errors will be identical. Using orthogonal polynomials doesn't mean you magically have more certainty of the slope of $X$ at any given point. The stability of the models is identical. See below:

data("iris")

#Raw:
fit.raw <- lm(Petal.Length ~ Petal.Width + I(Petal.Width^2) +
                  I(Petal.Width^3), data = iris)
summary(fit.raw)

#> Coefficients:
#>                  Estimate Std. Error t value Pr(>|t|)    
#> (Intercept)        1.1034     0.1304   8.464 2.50e-14 ***
#> Petal.Width        1.1527     0.5836   1.975  0.05013 .  
#> I(Petal.Width^2)   1.7100     0.5487   3.116  0.00221 ** 
#> I(Petal.Width^3)  -0.5788     0.1408  -4.110 6.57e-05 ***
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 
#> Residual standard error: 0.3898 on 146 degrees of freedom
#> Multiple R-squared:  0.9522, Adjusted R-squared:  0.9512 
#> F-statistic: 969.9 on 3 and 146 DF,  p-value: < 2.2e-16

#Orthogonal
fit.orth <- lm(Petal.Length ~ stats::poly(Petal.Width, 3), data = iris)

#Marginal effect of X at X=0 from orthogonal model
library(margins)
summary(margins(fit.orth, variables = "Petal.Width", 
                at = data.frame(Petal.Width = 0)))
#> Warning in check_values(data, at): A 'at' value for 'Petal.Width' is
#> outside observed data range (0.1,2.5)!
#>       factor Petal.Width    AME     SE      z      p  lower  upper
#>  Petal.Width      0.0000 1.1527 0.5836 1.9752 0.0482 0.0089 2.2965

^{Created on 2019-10-25 by the reprex package (v0.3.0)}

The marginal effect of Petal.Width at 0 from the orthogonal fit and its standard error are exactly equal to those from the raw polynomial fit (i.e., 1.1527. Using orthogonal polynomials doesn't improve the precision of estimates of the same quantity between the two models.

The key is the following: using orthogonal polynomials allows you to isolate the contribution of each term to explaining variance in the outcome, e.g., as measured by the squared semipartial correlation. If you fit an orthogonal polynomial of order 3, the squared semipartial correlation for each term represents the variance in the outcome explained by that term in the model. So, if you wanted to answer "How much of the variance in $Y$ is explained the linear component of $X$?" you could fit an orthogonal polynomial regression, and the squared semipartial correlation on the linear term would represent this quantity. This is not so with raw polynomials. If you fit a raw polynomial model of the same order, the squared partial correlation on the linear term does not represent the proportion of variance in $Y$ explained by the linear component of $X$. See below.

library(jtools)
data("iris")

fit.raw3 <- lm(Petal.Length ~ Petal.Width + I(Petal.Width^2) +
                  I(Petal.Width^3), data = iris)
fit.raw1 <- lm(Petal.Length ~ Petal.Width, data = iris)

round(summ(fit.raw3, part.corr = T)$coef, 3)
#>                    Est.  S.E. t val.     p partial.r part.r
#> (Intercept)       1.103 0.130  8.464 0.000        NA     NA
#> Petal.Width       1.153 0.584  1.975 0.050     0.161  0.036
#> I(Petal.Width^2)  1.710 0.549  3.116 0.002     0.250  0.056
#> I(Petal.Width^3) -0.579 0.141 -4.110 0.000    -0.322 -0.074

round(summ(fit.raw1, part.corr = T)$coef, 3)
#>              Est.  S.E. t val. p partial.r part.r
#> (Intercept) 1.084 0.073 14.850 0        NA     NA
#> Petal.Width 2.230 0.051 43.387 0     0.963  0.963

fit.orth3 <- lm(Petal.Length ~ stats::poly(Petal.Width, 3), 
               data = iris)
fit.orth1 <- lm(Petal.Length ~ stats::poly(Petal.Width, 3)[,1], 
               data = iris)

round(summ(fit.orth3, part.corr = T)$coef, 3)
#>                                Est.  S.E.  t val. p partial.r part.r
#> (Intercept)                   3.758 0.032 118.071 0        NA     NA
#> stats::poly(Petal.Width, 3)1 20.748 0.390  53.225 0     0.975  0.963
#> stats::poly(Petal.Width, 3)2 -3.015 0.390  -7.735 0    -0.539 -0.140
#> stats::poly(Petal.Width, 3)3 -1.602 0.390  -4.110 0    -0.322 -0.074

round(summ(fit.orth1, part.corr = T)$coef, 3)
#>                                    Est.  S.E. t val. p partial.r part.r
#> (Intercept)                       3.758 0.039 96.247 0        NA     NA
#> stats::poly(Petal.Width, 3)[, 1] 20.748 0.478 43.387 0     0.963  0.963

^{Created on 2019-10-25 by the reprex package (v0.3.0)}

The squared semipartial correlations for the raw polynomials when the polynomial of order 3 is fit are $0.001$, $0.003$, and $0.005$. When only the linear term is fit, the squared semipartial correlation is $0.927$. The squared semipartial correlations for the orthogonal polynomials when the polynomial of order 3 is fit are $0.927$, $0.020$, and $0.005$. When only the linear term is fit, the squared semipartial correlation is still $0.927$. From the orthogonal polynomial model but not the raw polynomial model, we know that most of the variance explained in the outcome is due to the linear term, with very little coming from the square term and even less from the cubic term. The raw polynomial values don't tell that story.

Now, if you want this interpretational benefit over the interpretational benefit of actually being able to understand the coefficients of the model, then you should use orthogonal polynomials. If you would prefer to look at the coefficients and know exactly what they mean (though I doubt one typically does), then you should use the raw polynomials. If you don't care (i.e., you only want to control for confounding or generate predicted values), then it truly doesn't matter; both forms carry the same information with respect to those goals. I would also argue that orthogonal polynomials should be preferred in regularization (e.g., lasso), because removing higher-order terms doesn't affect the coefficients of the lower order terms, which is not true with raw polynomials, and regularization techniques often care about the size of each coefficient.

Related Solutions

Solved – Multivariate orthogonal polynomial regression

For completion's sake (and to help improve the stats of this site, ha) I have to wonder if this paper wouldn't also answer your question?

ABSTRACT: We discuss the choice of polynomial basis for approximation of uncertainty propagation through complex simulation models with capability to output derivative information. Our work is part of a larger research effort in uncertainty quantification using sampling methods augmented with derivative information. The approach has new challenges compared with standard polynomial regression. In particular, we show that a tensor product multivariate orthogonal polynomial basis of an arbitrary degree may no longer be constructed. We provide sufficient conditions for an orthonormal set of this type to exist, a basis for the space it spans. We demonstrate the benefits of the basis in the propagation of material uncertainties through a simplified model of heat transport in a nuclear reactor core. Compared with the tensor product Hermite polynomial basis, the orthogonal basis results in a better numerical conditioning of the regression procedure, a modest improvement in approximation error when basis polynomials are chosen a priori, and a significant improvement when basis polynomials are chosen adaptively, using a stepwise fitting procedure.

Otherwise, the tensor-product basis of one-dimensional polynomials is not only the appropriate technique, but also the only one I can find for this.

Solved – get wildly different results for poly(raw=T) vs. poly()

I think this is a bug in the predict function (and hence my fault), which in fact nlme does not share. (Edit: should be fixed in most recent R-forge version of lme4.) See below for an example ...

I think your understanding of orthogonal polynomials is probably just fine. The tricky thing you need to know about them if you are trying to write a predict method for a class of models is that the basis for the orthogonal polynomials is defined based on a given set of data, so if you naively (like I did!) use model.matrix to try to generate the design matrix for a new set of data, you get a new basis -- which no longer makes sense with the old parameters. Until I get this fixed, I may need to put a trap in that tells people that predict doesn't work with orthogonal polynomial bases (or spline bases, which have the same property).

d <- expand.grid(x=seq(0,1,length=50),f=LETTERS[1:10])
set.seed(1001)
u.int <- rnorm(10,sd=0.5)
u.slope <- rnorm(10,sd=0.2)
u.quad <- rnorm(10,sd=0.1)
d <- transform(d,
               ypred = (1+u.int[f])+
               (2+u.slope[f])*x-
               (1+u.quad[f])*x^2)
d$y <- rnorm(nrow(d),mean=d$ypred,sd=0.2)
ggplot(d,aes(x=x,y=y,colour=f))+geom_line()+
    geom_line(aes(y=ypred),linetype=2)

library(lme4)
fm1 <- lmer(y~poly(x,2,raw=TRUE)+(1|f)+(0+x|f)+(0+I(x^2)|f),
            data=d)


fm2 <- lmer(y~poly(x,2)+(1|f)+(0+x|f)+(0+I(x^2)|f),
            data=d)
newdat <- data.frame(x=unique(d$x))
plot(predict(fm1,newdata=newdat,REform=NA))
lines(predict(fm2,newdata=newdat,REform=NA),col=2)
detach("package:lme4")

library(nlme)
fm3 <- lme(y~poly(x,2,raw=TRUE),
           random=list(~1|f,~0+x|f,~0+I(x^2)|f),
            data=d)
VarCorr(fm3)

fm4 <- lme(y~poly(x,2),
           random=list(~1|f,~0+x|f,~0+I(x^2)|f),
            data=d)

newdat <- data.frame(x=unique(d$x))
lines(predict(fm3,newdata=newdat,level=0),col=4)
lines(predict(fm4,newdata=newdat,level=0),col=5)

Best Answer

Related Solutions

Solved – Multivariate orthogonal polynomial regression

Solved – get wildly different results for poly(raw=T) vs. poly()

Related Question