Solved – Centering in linear regression

centeringregression

I am trying to fit a quadratic to my model, I have tuples (x,y).

The choices are,

1) lm(y~x+I(x^2))

2) lm(y~(x-mean(x))+I(x-mean(x))^2)

3) lm(y~(x-mean(x))+I(x^2 - mean(x^2)))

In other words, in 3, I am centering the quadratic term, using its own mean.

I do understand that centering to reduce multicollinearity is not an issue here. I am just looking to understand how to center in general. Intuitively 3) makes more sense, I am treating the linear and the quadratic vars as separate and just centering them in a usual way. 2 is odd because the quadratic term will also have a linear component once you open the squares up. 1 and 3 give the same coefficients which is different from 2, but there seems to be no relationship between the linear coefficient from 2 and 1. The quadratic coefficient is the same across all models.

The outputs are

model 1)

Call:
lm(formula = y ~ x + I(x^2))

Residuals:
    Min      1Q  Median      3Q     Max 
-73.845 -10.151   1.224   9.660  73.553 

Coefficients:
              Estimate Std. Error t value Pr(>|t|)   
(Intercept) 262.709845  82.982956   3.166   0.0016 **
x             0.150473   1.346574   0.112   0.9111   
I(x^2)       -0.002182   0.005459  -0.400   0.6895

model 2)

Call:
lm(formula = y ~ (x-mean(x)) + (x-mean(x))^2)

Residuals:
    Min      1Q  Median      3Q     Max 
-73.845 -10.151   1.224   9.660  73.553 

Coefficients:
              Estimate Std. Error t value Pr(>|t|)    
(Intercept) 247.263060   0.657972 375.796   <2e-16 ***
x -mean(x)    -0.396789   0.080544  -4.926    1e-06 ***
(x -mean(x))^2 -0.002182   0.005459  -0.400     0.69

And model 3)

Call:
lm(formula = y ~ (x - mean(x)) + I(x^2 - mean(x^2)))

Residuals:
    Min      1Q  Median      3Q     Max 
-73.845 -10.151   1.224   9.660  73.553 

Coefficients:
                     Estimate Std. Error t value Pr(>|t|)    
(Intercept)        247.138199   0.579052 426.798   <2e-16 ***
x - mean(x)         0.150473   1.346574   0.112    0.911    
I(x^2 - mean(x^2))  -0.002182   0.005459  -0.400    0.690

Notice 1 and 3) give the same coefficient estimates and 2 is different for the coefficient on the linear term. The coefficient of the quadratic term all agree. The model 2 is significant for the linear term and the other ones are not, why?

Best Answer

When you fit a regression model for a single variable and its squared effect, the interpretation of coefficient for the linear term changes. The coefficient for the linear term is the instantaneous slope of the parabola at the intercept. Therefore, it's easy to see how models 1 and 2 differ. Model 1 will give you the tangent line for the parabola at the intercept whereas Model 2 will give you the tangent line at the mean of $x$. Model 3, on the other hand is very complicated... you can see with some algebra what you are fitting:

$$ \begin{eqnarray} y &=& a + b*(x-\bar{x}) + c * (x^2 - \bar{x}^2)\\ &=& a + b*(x-\bar{x}) + c * (x-\bar{x})^2 - 2c(x - \bar{x})\\ &=& a + (b-2c)*(x-\bar{x}) + c * (x-\bar{x})^2 \\ \end{eqnarray} $$

Related Solutions

Solved – Comparing two linear regression models

If you set up the data in one long column with A and B as a new column, you then can run your regression model as a GLM with a continuous time variable and a nominal "experiment" variable (A, B). The output of the ANOVA will give you the significance of the difference between the parameters. "intercept' is the common intercept and the "experiment" factor will reflect differences between the intercepts (actually overall means) between the experiments. the "Time" factor will be the common slope, and the interaction is the difference between the experiments with respect to the slope.

I have to admit I cheat (?) and run the models separately first to get the two sets of parameters and their errors and then run the combined model to acquire the differences between the treatments (in your case A and B)...

Regression Analysis – How to Include Linear and Quadratic Terms with Interactions Within Variables

When including polynomials and interactions between them, multicollinearity can be a big problem; one approach is to look at orthogonal polynomials.

Generally, orthogonal polynomials are a family of polynomials which are orthogonal with respect to some inner product.

So for example in the case of polynomials over some region with weight function $w$, the inner product is $\int_a^bw(x)p_m(x)p_n(x)dx$ - orthogonality makes that inner product $0$ unless $m=n$.

The simplest example for continuous polynomials is the Legendre polynomials, which have constant weight function over a finite real interval (commonly over $[-1,1]$).

In our case, the space (the observations themselves) is discrete, and our weight function is also constant (usually), so the orthogonal polynomials are a kind of discrete equivalent of Legendre polynomials. With the constant included in our predictors, the inner product is simply $p_m(x)^Tp_n(x) = \sum_i p_m(x_i)p_n(x_i)$.

For example, consider $x = 1,2,3,4,5$

Start with the constant column, $p_0(x) = x^0 = 1$. The next polynomial is of the form $ax-b$, but we're not worrying about scale at the moment, so $p_1(x) = x-\bar x = x-3$. The next polynomial would be of the form $ax^2+bx+c$; it turns out that $p_2(x)=(x-3)^2-2 = x^2-6x+7$ is orthogonal to the previous two:

x         p0  p1  p2   
1          1  -2   2   
2          1  -1  -1
3          1   0  -2
4          1   1  -1
5          1   2   2

Frequently the basis is also normalized (producing an orthonormal family) - that is, the sums of squares of each term is set to be some constant (say, to $n$, or to $n-1$, so that the standard deviation is 1, or perhaps most frequently, to $1$).

Ways to orthogonalize a set of polynomial predictors include Gram-Schmidt orthogonalization, and Cholesky decomposition, though there are numerous other approaches.

Some of the advantages of orthogonal polynomials:

1) multicollinearity is a nonissue - these predictors are all orthogonal.

2) The low-order coefficients don't change as you add terms. If you fit a degree $k$ polynomial via orthogonal polynomials, you know the coefficients of a fit of all the lower order polynomials without re-fitting.

Example in R (cars data, stopping distances against speed): enter image description here

Here we consider the possibility that a quadratic model might be suitable:

R uses the poly function to set up orthogonal polynomial predictors:

> p <- model.matrix(dist~poly(speed,2),cars)
> cbind(head(cars),head(p))
  speed dist (Intercept) poly(speed, 2)1 poly(speed, 2)2
1     4    2           1      -0.3079956      0.41625480
2     4   10           1      -0.3079956      0.41625480
3     7    4           1      -0.2269442      0.16583013
4     7   22           1      -0.2269442      0.16583013
5     8   16           1      -0.1999270      0.09974267
6     9   10           1      -0.1729098      0.04234892

They're orthogonal:

> round(crossprod(p),9)
                (Intercept) poly(speed, 2)1 poly(speed, 2)2
(Intercept)              50               0               0
poly(speed, 2)1           0               1               0
poly(speed, 2)2           0               0               1

Here's a plot of the polynomials: enter image description here

Here's the linear model output:

> summary(carsp)

Call:
lm(formula = dist ~ poly(speed, 2), data = cars)

Residuals:
    Min      1Q  Median      3Q     Max 
-28.720  -9.184  -3.188   4.628  45.152 

Coefficients:
                Estimate Std. Error t value Pr(>|t|)    
(Intercept)       42.980      2.146  20.026  < 2e-16 ***
poly(speed, 2)1  145.552     15.176   9.591 1.21e-12 ***
poly(speed, 2)2   22.996     15.176   1.515    0.136    
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 15.18 on 47 degrees of freedom
Multiple R-squared:  0.6673,    Adjusted R-squared:  0.6532 
F-statistic: 47.14 on 2 and 47 DF,  p-value: 5.852e-12

Here's a plot of the quadratic fit: enter image description here

Best Answer

Related Solutions

Solved – Comparing two linear regression models

Regression Analysis – How to Include Linear and Quadratic Terms with Interactions Within Variables

Related Question