Multiple Regression – How to Interpret Standardized Regression Coefficients and P-Values

importancemultiple regressionrregression coefficientsstatistical significance

I've been using R to analyze my data (as shown in example below) and lm.beta from the QuantPsyc package to get the standardized regression coefficients.

My understanding is that the absolute value of the standardized regression coefficients should reflect its importance as a predictor. I was also under the impression (and the intuition) that the variable with the largest absolute value should be the most significant independent predictor and should have the lowest p-value. However, I'm not finding that in my data.

For example (taken from my data), I have a multiple regression with dependent variable y and 7 independent variables x1:x7.

    Call:
lm(formula = y ~ x1 + x2 + x3 + x4 + x5 + x6 + x7)

For 3 of the variables, the beta values and the p-values make sense to me (the greater the magnitude of beta, the lower p-value), but for 4 of them this is not the case. I'll show only the p-values and betas for those 4 to keep this short.

    x1          x2          x3          x7
p   0.006635    0.00004683  0.000152    0.022427
ß   0.15707977  0.24149287  0.27171665  0.16583391

As you can see, x2 has a lower p-value than x3, but x3 has a larger value for beta. Similarly, x7 has a larger beta value than x1, but is less significant.

I've searched for an explanation but have found conflicting information. Is that because there's no straightforward answer to this question? Am I doing something wrong?

Best Answer

For the standard linear regression model the absolute value of the coefficient estimates and the p-value are not related in the way you describe. It is very possible to have absolutely large coefficients which are insignificant and absolutely small coefficients which are very significant. What your missing in your interpretation is the effect of the coefficient estimate standard errors.

The coefficients R reports (lets call them $b_1,b_2,b_3,...,b_k$) are the best linear unbiased estimators of the true parameters $\beta_1,\beta_2,\beta_3,...,\beta_k$ in that they minimize the sum of squared error or formally: $$ \{b_1,b_2,...,b_k\} = {\textrm{argmin} \atop \alpha}\left\{ \sum_{i=1}^{n}(y_i-\alpha_1x_{i,1}-. . .-\alpha_kx_{i,k})^2]\right\} $$

The p-value for the $i^{th}$ coefficient which R is reporting is the result of the following hypothesis test:

$H_0: \beta_i = 0$

$H_A: \beta_i \neq 0$

Assuming the regression is properly specified, it can be shown, with the central limit theorem, that each $b_i$ is a normally distributed random variable with mean $\beta_i$ and some standard deviation (also called standard error) $\sigma_i$. This is because the $b$'s are estimated with a random sample so they too are random variables (roughly speaking). What determines the $i^{th}$ p-value is where 0 "lands" in the normal distribution $N(\beta_i,\sigma_i^2)$ (technically the test is done using a t-distribution...but the difference is not so important for addressing your question). If zero is in the tails of $N(\beta_i,\sigma_i^2)$ the p-value is low, if it's more in the middle the p-value is high.

So given two estimates $b_i$ and $b_j$ where $b_i$ is "super far away" from zero and $b_j$ is "super close to" zero, the p-value of $b_i$ would be lower than $b_j$ assuming $\sigma_i$=$\sigma_j$. The part you are missing in your interpretation is that $\sigma_i$ and $\sigma_j$ can be very different. Essentially if $b_i$ is "huge" but $\sigma_i$ is also "huge" you see that you can get a high p-value. Conversely for "small" $b_i$ and "super small" $\sigma_i$, you see you can get a small p-value.

I hope that helps!

Related Solutions

Standardized to Unstandardized Coefficients – Conversion Guide

It sounds like the paper uses a multiple regression model in the form

$$Y = \beta_0 + \sum_i \beta_i \xi_i + \varepsilon$$

where the $\xi_i$ are standardized versions of the independent variables; viz.,

$$\xi_i = \frac{x_i - m_i}{s_i}$$

withe $m_i$ the mean (such as 12.56 in the example) and $s_i$ the standard deviation (such as 9.02 in the example) of the values of the $i^\text{th}$ variable $x_i$ ('buslines' in the example). $\beta_0$ is the intercept (if present). Plugging this expression into the fitted model, with its "betas" written as $\hat{\beta_i}$ (0.275 in the example), and doing some algebra gives the estimates

$$\hat{Y} = \hat{\beta_0} + \sum_i \hat{\beta_i} \frac{x_i - m_i}{s_i}=\left(\hat{\beta_0}-\left(\sum_i\frac{\hat{\beta_i m_i}}{s_i}\right)\right)+\sum_i\left(\frac{\hat{\beta_i}}{s_i}\right)x_i.$$

This shows that the coefficients of the $x_i$ in the model (apart from the constant term) are obtained by dividing the betas by the standard deviations of the independent variables and that the intercept is adjusted by subtracting a suitable linear combination of the betas.

This gives you two ways to predict a new value from a vector $(x_1, \ldots, x_p)$ of independent values:

Using the means $m_i$ and standard deviations $s_i$ as reported in the paper (not recomputed from any new data!), calculate $(\xi_1,\ldots, \xi_p) = ((x_1-m_1)/s_1, \ldots, (x_p-m_p)/s_p)$ and plug those into the regression formula as given by the betas or, equivalently,
Plug $(x_1, \ldots, x_p)$ into the algebraically equivalent formula derived above.

If the paper is using a Generalized Linear Model, you may need to follow this calculation by applying the inverse "link" function to $\hat{Y}$. For example, with logistic regression it would be necessary to apply the logistic function $1/(1 + \exp(-\hat{Y}))$ to obtain the predicted probability ($\hat{Y}$ is the predicted log odds).

Multiple Linear Regression – Magnitude of Standardized Coefficients (Beta) Explained

It's never easy telling your professor that they are wrong.

Standardized coefficients can be greater than 1.00, as that article explains and as is easy to demonstrate. Whether they should be excluded depends on why they happened - but probably not.

They are a sign that you have some pretty serious collinearity. One case where they often occur is when you have non-linear effects, such as when $x$ and $x^2$ are included as predictors in a model.

Here's a quick demonstration:

data(cars)
cars$speed2 <- cars$speed^2
cars$speed3 <- cars$speed^3
fit1 <- lm(dist ~ speed, data=cars)
fit2 <- lm(dist ~ speed + cars$speed2, data=cars)
    fit3 <- lm(dist ~ speed + cars$speed2 + speed3, data=cars)
summary(fit1)
summary(fit2)
summary(fit3)
lm.beta(fit1)
lm.beta(fit2)
lm.beta(fit3)

Final bit of output:

> lm.beta(fit3)
   speed    speed2    speed3 
  1.395526 -2.212406  1.681041

Or if you prefer you can standardize the variables first:

 zcars <- as.data.frame(rapply(cars, scale, how="list"))
 fit3 <- lm(dist ~ speed + speed2 + speed3, data=zcars)

 summary(fit3)

Call:
lm(formula = dist ~ speed + speed2 + speed3, data = zcars)

Residuals:
     Min       1Q   Median       3Q      Max 
-1.03496 -0.37258 -0.08659  0.27456  1.73426 

Coefficients:
              Estimate Std. Error t value Pr(>|t|)
(Intercept)  7.440e-16  8.344e-02   0.000    1.000
speed        1.396e+00  1.396e+00   1.000    0.323
speed2      -2.212e+00  3.163e+00  -0.699    0.488
speed3       1.681e+00  1.853e+00   0.907    0.369

Residual standard error: 0.59 on 46 degrees of freedom
Multiple R-squared:  0.6732,    Adjusted R-squared:  0.6519 
F-statistic: 31.58 on 3 and 46 DF,  p-value: 3.074e-11

You don't need to do it with lm(), you can do it with matrix algebra if you prefer:

Rxx <- cor(cars)[c(1, 3, 4), c(1, 3, 4)]
Rxy <- cor(cars)[2, c(1, 3, 4)]
B <- (ginv(Rxx)) %*% Rxy
B

         [,1]
[1,]  1.395526
[2,] -2.212406
[3,]  1.681041

Best Answer

Related Solutions

Standardized to Unstandardized Coefficients – Conversion Guide

Multiple Linear Regression – Magnitude of Standardized Coefficients (Beta) Explained

Related Question