Multiple Regression – How to Interpret Standardized Regression Coefficients and P-Values

importancemultiple regressionrregression coefficientsstatistical significance

I've been using R to analyze my data (as shown in example below) and lm.beta from the QuantPsyc package to get the standardized regression coefficients.

My understanding is that the absolute value of the standardized regression coefficients should reflect its importance as a predictor. I was also under the impression (and the intuition) that the variable with the largest absolute value should be the most significant independent predictor and should have the lowest p-value. However, I'm not finding that in my data.

For example (taken from my data), I have a multiple regression with dependent variable y and 7 independent variables x1:x7.

    Call:
lm(formula = y ~ x1 + x2 + x3 + x4 + x5 + x6 + x7)

For 3 of the variables, the beta values and the p-values make sense to me (the greater the magnitude of beta, the lower p-value), but for 4 of them this is not the case. I'll show only the p-values and betas for those 4 to keep this short.

    x1          x2          x3          x7
p   0.006635    0.00004683  0.000152    0.022427
ß   0.15707977  0.24149287  0.27171665  0.16583391 

As you can see, x2 has a lower p-value than x3, but x3 has a larger value for beta. Similarly, x7 has a larger beta value than x1, but is less significant.

I've searched for an explanation but have found conflicting information. Is that because there's no straightforward answer to this question? Am I doing something wrong?

Best Answer

For the standard linear regression model the absolute value of the coefficient estimates and the p-value are not related in the way you describe. It is very possible to have absolutely large coefficients which are insignificant and absolutely small coefficients which are very significant. What your missing in your interpretation is the effect of the coefficient estimate standard errors.

The coefficients R reports (lets call them $b_1,b_2,b_3,...,b_k$) are the best linear unbiased estimators of the true parameters $\beta_1,\beta_2,\beta_3,...,\beta_k$ in that they minimize the sum of squared error or formally: $$ \{b_1,b_2,...,b_k\} = {\textrm{argmin} \atop \alpha}\left\{ \sum_{i=1}^{n}(y_i-\alpha_1x_{i,1}-. . .-\alpha_kx_{i,k})^2]\right\} $$

The p-value for the $i^{th}$ coefficient which R is reporting is the result of the following hypothesis test:

$H_0: \beta_i = 0$

$H_A: \beta_i \neq 0$

Assuming the regression is properly specified, it can be shown, with the central limit theorem, that each $b_i$ is a normally distributed random variable with mean $\beta_i$ and some standard deviation (also called standard error) $\sigma_i$. This is because the $b$'s are estimated with a random sample so they too are random variables (roughly speaking). What determines the $i^{th}$ p-value is where 0 "lands" in the normal distribution $N(\beta_i,\sigma_i^2)$ (technically the test is done using a t-distribution...but the difference is not so important for addressing your question). If zero is in the tails of $N(\beta_i,\sigma_i^2)$ the p-value is low, if it's more in the middle the p-value is high.

So given two estimates $b_i$ and $b_j$ where $b_i$ is "super far away" from zero and $b_j$ is "super close to" zero, the p-value of $b_i$ would be lower than $b_j$ assuming $\sigma_i$=$\sigma_j$. The part you are missing in your interpretation is that $\sigma_i$ and $\sigma_j$ can be very different. Essentially if $b_i$ is "huge" but $\sigma_i$ is also "huge" you see that you can get a high p-value. Conversely for "small" $b_i$ and "super small" $\sigma_i$, you see you can get a small p-value.

I hope that helps!