Multiple Linear Regression – Magnitude of Standardized Coefficients (Beta) Explained

multiple regressionrregression

Being aware of that article, I am curious about the question how big standardized coefficients can get. I had a discussion with my professor about that issue and she was arguing standardized coefficients (beta) in multiple linear regressions can not become greater than |1|. I have also heard that predictors with standardized coefficients greater than 1 should not be be included/appear in multiple linear regression. When I recently estimated a multiple linear regression in R using lm(), I estimated the standardized coefficients with lm.beta() function from the package 'lm.beta'. In the results I could observe a standardized coefficient greater than one. Right now I am just not sure about what is the truth.

Can standardized coefficients become greater than |1|?
If yes, what does that mean and should they be excluded from the model?
If yes, why?

I would be very thankful, if somebody could make this issue clear for me.

Best Answer

It's never easy telling your professor that they are wrong.

Standardized coefficients can be greater than 1.00, as that article explains and as is easy to demonstrate. Whether they should be excluded depends on why they happened - but probably not.

They are a sign that you have some pretty serious collinearity. One case where they often occur is when you have non-linear effects, such as when $x$ and $x^2$ are included as predictors in a model.

Here's a quick demonstration:

data(cars)
cars$speed2 <- cars$speed^2
cars$speed3 <- cars$speed^3
fit1 <- lm(dist ~ speed, data=cars)
fit2 <- lm(dist ~ speed + cars$speed2, data=cars)
    fit3 <- lm(dist ~ speed + cars$speed2 + speed3, data=cars)
summary(fit1)
summary(fit2)
summary(fit3)
lm.beta(fit1)
lm.beta(fit2)
lm.beta(fit3)

Final bit of output:

> lm.beta(fit3)
   speed    speed2    speed3 
  1.395526 -2.212406  1.681041 

Or if you prefer you can standardize the variables first:

 zcars <- as.data.frame(rapply(cars, scale, how="list"))
 fit3 <- lm(dist ~ speed + speed2 + speed3, data=zcars)

 summary(fit3)

Call:
lm(formula = dist ~ speed + speed2 + speed3, data = zcars)

Residuals:
     Min       1Q   Median       3Q      Max 
-1.03496 -0.37258 -0.08659  0.27456  1.73426 

Coefficients:
              Estimate Std. Error t value Pr(>|t|)
(Intercept)  7.440e-16  8.344e-02   0.000    1.000
speed        1.396e+00  1.396e+00   1.000    0.323
speed2      -2.212e+00  3.163e+00  -0.699    0.488
speed3       1.681e+00  1.853e+00   0.907    0.369

Residual standard error: 0.59 on 46 degrees of freedom
Multiple R-squared:  0.6732,    Adjusted R-squared:  0.6519 
F-statistic: 31.58 on 3 and 46 DF,  p-value: 3.074e-11

You don't need to do it with lm(), you can do it with matrix algebra if you prefer:

Rxx <- cor(cars)[c(1, 3, 4), c(1, 3, 4)]
Rxy <- cor(cars)[2, c(1, 3, 4)]
B <- (ginv(Rxx)) %*% Rxy
B

         [,1]
[1,]  1.395526
[2,] -2.212406
[3,]  1.681041
Related Question