Regression – How to Interpret Scaled Regression Coefficients When Only Predictors Are Scaled

interpretationregressionregression coefficientsstandardization

I'm running a model with 2 continuous predictors (x1, x2) and 1 continuous outcome variable (y). The results show that both of the slopes are significant, as well as the intercept, with no significant interaction effect. Let's say that my results are something like this:

(intercept):  216.00
x1:           -12.00
x2:            -8.00

Now, for the sake of interpretability, I've decided to standardize them. So I used the scale() function, and my model now has this form:

model.s <- lm(scale(y)~scale(x1)*scale(x2))

with these results:

(intercept):  -0.0123   # It's not significant anymore
x1:           -2.3  
x2:           -1.2

My questions are:

Why the intercept lost its significance, and if this is normal,
I have scaled all 3 variables, is anything wrong with that?
How can I interpret the intercept in the scaled model?

Regarding the last one, my interpretation is that:

when x1 is at mean(x1) and x2 is at mean(x2), y is 0.0123 SDs away from mean.
when x1 goes up by 1SD, and x2 is at mean(x2), y decreases by -2.3SDs
when x2 goes up by 1SD, and x1 is at mean(x1), y decreases by -1.2SDs

With standardized the predictors, but not the outcome variable:

model.s1 <- lm(y~scale(x1)*scale(x2))

The results are somewhat different, it appears that the significance returns to the intercept and the values are altered:

(intercept):  98 
x1:          -20   
x2:          -17

My interpretation of these results is:

when x1 is at mean(x1) and x2 is at mean(x2), y is 98
when x1 goes up by 1SD, and x2 is at mean(x2), y decreases by -20 units
when x2 goes up by 1SD, and x1 is at mean(x1), y decreases by -17 units

In other words, I interpret the x1 and x2 in SD terms, while I interpret y in units. Is this interpretation wrong?

Best Answer

What the scale function does in R is answered here. Basically, it both re-scales the mean value to be zero and the standard deviation to be 1. Several points are worth noting

1) If the original variables were not normally distributed (ND), the scaled variables will not be ND either. Conversely, if the original variables are ND, the rescaled distributions will be ND.

2) A regression using scaled values will obviously have a different intercept than the unscaled originals if the original mean values were not zero.

3) If the original variables are distributed symmetrically about their means (and if the mean value is a good measure of location), then the intercept of the scaled, zero centered new variables' regression should be zero (even in the product), but only when everything ($y\&x$'s) is rescaled.

4) What does the scaling mean? Well, by itself, not much. One has to know what the means and standard deviations were to begin with in order to interpret the scaled results. Basically, it adds nothing, and may even complicate matters by introducing variability (think of multiple different time-series scalings on the $x$-axis) of independent variables.

5) Finally, do $y$ versus $model$ correlations for both the unscaled and scaled regressions' models and compare correlation coefficients. That will show no difference if the regression problem is unchanged.

That is, what you are doing is linear transformation. For example,

$\frac{y-\bar{y_1}}{sd_y}=m_{s,x_1}\frac{x_1-\bar{x_1}}{sd_{x_1}}+m_{s,x_2}\frac{x_2-\bar{x_2}}{sd_{x_2}}+b_{s}$

So, just multiply by ${sd_y}sd_{x_1}sd_{x_2}$, collect constant terms and add to $b_{s}{sd_y}sd_{x_1}sd_{x_2}$ to recover the initial $b$, where the initial $m_{x_1}={sd_y}sd_{x_1}sd_{x_2}m_{s,x_1}$, and $m_{x_2}={sd_y}sd_{x_1}sd_{x_2}m_{s,x_2}$

When this is done as a product of independent variables rather than a sum, things get messier still as there is then an $x_1x_2$, as well as $x_1$ and $x_2$ terms. So then it depends what your original equation was (which you did not provide). If the original equation does not have separate $x_1x_2$, as well as $x_1$ and $x_2$ terms, then the transformed equation and the original equation are two different regression problems, and will not have the same $r$-values.

Related Solutions

Solved – Interpretation of Coefficients in linear regression using ‘fitlm’

This could happen if there is correlation among the X variables. For instance, if X4 and X6 are highly correlated, then perhaps the effect of X4 on the response after adjusting for X6 is positive, even if the correlation between X4 and Y is negative.

I believe you can use something like

plotAdded(mdl,'x4')

to look at the relationship between X4 and Y after adjusting for the other predictors.

Solved – Only the intercept is significant in regression model (with dumthe variable?)

Does the significant intercept mean that population density would have increased/decreased regardless of the independent variables?

Technically, yes. It simply means that the independent variables that you have chosen do not affect your dependent variable. But it does not mean that your dependent variable does not depend on independent variables at all. Example: Try squaring your independent variables. Does that change their coefficient's significance? If yes, then you're simply using the wrong precision of independent variables.

Does having a dummy variable change that interpretation?

The dummy variable simply indicates the presence of some factor and if it's not significant, then it simply means that the dependent variable does not depend on the presence of that factor. It does not change the above interpretation.

Best Answer

Related Solutions

Solved – Interpretation of Coefficients in linear regression using ‘fitlm’

Solved – Only the intercept is significant in regression model (with dumthe variable?)

Related Question