Solved – Interpreting the “coefficient” output of the lm function in R

linear modelrregression

I have created a linear model (which has multiple predictors) using the lm() function and I would like to interpret the "coefficients" that I get when I use the summary() function on the linear model.

Now I want to consider how the coefficients reflect on the predictors' influence in the model – am I right in thinking that a large value for the coefficient means that the corresponding predictor has a greater effect? I'm not sure what else I need to consider or if I'm even thinking along the right lines.

Also, am I correct in thinking these "coefficients" are in fact the Beta coefficients?

Best Answer

Let's say we want to predict median value of house expressed in thousands of $ (medv) based on its age, number of rooms (rm) and the crime rate (crim) in its neighborhood. The dataset for this is called Boston and it is in the MASS package

> library(MASS)

Original regression model

> print(coef(lm(medv~crim+rm+age, data=Boston)))

 (Intercept)         crim           rm          age 
-23.60556128  -0.21102311   8.03283820  -0.05224283

So we see that the coefficients for those three predictors are -0.211, 8.03 and -0.05. If we want to calculate the price for any house we just use formula:

medv = -0.21102311*crim + 8.03283820*rm - 0.05224283*age

It means that for example, that for increase in one room the medv price rises for 8.03 thousand dollars.

Now, let's say we measure price in the dollars instead of thousands of dollars. The regression coefficients are totally different.

> Boston.2 <- Boston
> Boston.2$medv <- Boston.2$medv*1000

> print(coef(lm(medv~crim+rm+age, data=Boston.2)))
 (Intercept)         crim           rm          age 
-23605.56128   -211.02311   8032.83820    -52.24283

They are all scaled versions from the ones in the original model, but what if we measure for example, the crime rate per 1000?

Then there is a big change for the rm and age coefficients.

> Boston.3 <- Boston
> Boston.3$crim <- Boston.3$crim/1000

> print(coef(lm(medv~crim+rm+age, data=Boston.3)))
  (Intercept)          crim            rm           age 
 -23.60556128 -211.02311213    8.03283820   -0.05224283

So compare this model and the original one. If we compare the variables based on those coefficients it will seem that the crime rate is much much more important (-211.02 vs only 8.03) than the number of rooms(which is not realistic).

However, in the original model there is quite a different story. Number of rooms seems way more important as we're measuring variables differently.

To asses the relative importance of variables we first need to perform scaling which will make all the variables comparable. It won't matter anymore what unit is used (crim vs crim/1000) for the data, the coefficients will end up the same. We can then compare their values to see which ones are the most important for prediction.

We now apply the same model three times using the differently scaled data sets (Boston, Boston.2, Boston.3) - note the identical coefficient results:

Original regression model

> print(coef(lm(scale(medv)~scale(crim)+scale(rm)+scale(age), data=Boston)))
  (Intercept)   scale(crim)     scale(rm)    scale(age) 
-3.076923e-16 -1.973583e-01  6.136725e-01 -1.598956e-01

Regression model with prices in dollars instead of thousands

> print(coef(lm(scale(medv)~scale(crim)+scale(rm)+scale(age), data=Boston.2)))
  (Intercept)   scale(crim)     scale(rm)    scale(age) 
-3.076923e-16 -1.973583e-01  6.136725e-01 -1.598956e-01

Regression model with crime rate per 1000 people

> print(coef(lm(scale(medv)~scale(crim)+scale(rm)+scale(age), data=Boston.3)))
  (Intercept)   scale(crim)     scale(rm)    scale(age) 
-3.076923e-16 -1.973583e-01  6.136725e-01 -1.598956e-01

From this it seems that number of rooms is indeed more important than the crime rate, which again is a common sense :).

Related Solutions

Multiple Linear Regression – Magnitude of Standardized Coefficients (Beta) Explained

It's never easy telling your professor that they are wrong.

Standardized coefficients can be greater than 1.00, as that article explains and as is easy to demonstrate. Whether they should be excluded depends on why they happened - but probably not.

They are a sign that you have some pretty serious collinearity. One case where they often occur is when you have non-linear effects, such as when $x$ and $x^2$ are included as predictors in a model.

Here's a quick demonstration:

data(cars)
cars$speed2 <- cars$speed^2
cars$speed3 <- cars$speed^3
fit1 <- lm(dist ~ speed, data=cars)
fit2 <- lm(dist ~ speed + cars$speed2, data=cars)
    fit3 <- lm(dist ~ speed + cars$speed2 + speed3, data=cars)
summary(fit1)
summary(fit2)
summary(fit3)
lm.beta(fit1)
lm.beta(fit2)
lm.beta(fit3)

Final bit of output:

> lm.beta(fit3)
   speed    speed2    speed3 
  1.395526 -2.212406  1.681041

Or if you prefer you can standardize the variables first:

 zcars <- as.data.frame(rapply(cars, scale, how="list"))
 fit3 <- lm(dist ~ speed + speed2 + speed3, data=zcars)

 summary(fit3)

Call:
lm(formula = dist ~ speed + speed2 + speed3, data = zcars)

Residuals:
     Min       1Q   Median       3Q      Max 
-1.03496 -0.37258 -0.08659  0.27456  1.73426 

Coefficients:
              Estimate Std. Error t value Pr(>|t|)
(Intercept)  7.440e-16  8.344e-02   0.000    1.000
speed        1.396e+00  1.396e+00   1.000    0.323
speed2      -2.212e+00  3.163e+00  -0.699    0.488
speed3       1.681e+00  1.853e+00   0.907    0.369

Residual standard error: 0.59 on 46 degrees of freedom
Multiple R-squared:  0.6732,    Adjusted R-squared:  0.6519 
F-statistic: 31.58 on 3 and 46 DF,  p-value: 3.074e-11

You don't need to do it with lm(), you can do it with matrix algebra if you prefer:

Rxx <- cor(cars)[c(1, 3, 4), c(1, 3, 4)]
Rxy <- cor(cars)[2, c(1, 3, 4)]
B <- (ginv(Rxx)) %*% Rxy
B

         [,1]
[1,]  1.395526
[2,] -2.212406
[3,]  1.681041

Solved – Interpreting circular-linear regression coefficient

See the documentation:

help(lm.circular)

"If type=="c-l" or lm.circular.cl is called directly, this function implements the homoscedastic version of the maximum likelihood regression model proposed by Fisher and Lee (1992). The model assumes that a circular response variable theta has a von Mises distribution with concentration parameter kappa, and mean direction related to a vector of linear predictor variables according to the relationship: mu + 2*atan(beta'*x), where mu and beta are unknown parameters, beta being a vector of regression coefficients. The function uses Green's (1984) iteratively reweighted least squares algorithm to perform the maximum likelihood estimation of kappa, mu, and beta. Standard errors of the estimates of kappa, mu, and beta are estimated via large-sample asymptotic variances using the information matrix. An estimated circular standard error of the estimate of mu is then obtained according to Fisher and Lewis (1983, Example 1)."

Thus you should compare with a different model

> nls(y~a+2*atan(b*x),start=c(a=0.06337,b=0.022344),data=list(x=x,y=y))
Nonlinear regression model
  model: y ~ a + 2 * atan(b * x)
   data: list(x = x, y = y)
      a       b 
0.07112 0.02231 
 residual sum-of-squares: 12.36

Number of iterations to convergence: 12 
Achieved convergence tolerance: 5.838e-06

this 'nls' function does not use the same underlying distributions for the residual terms but does provide similar coefficients.

Clearly you made your posted problem very simplified in order to make it easier to be understood.

Could you add your real case? (to spice up the question)

Best Answer

Related Solutions

Multiple Linear Regression – Magnitude of Standardized Coefficients (Beta) Explained

Solved – Interpreting circular-linear regression coefficient

Related Question