Regression Analysis – Why Standardize Regression Variables?

linear modelregressionstandardization

Many textbooks and articles (such as this one) advise to standardize variables before entering them into our regression models; i.e., (variable – mean) / standard deviation

However, I just came across a counter example. How do you rationalize this?

Let's say:
$$Y = 2X$$
$$X = \{-3, -2, -1, 0, 1, 2, 3 \}$$
$$Y = \{-6, -4, -2, 0, 2, 4, 6 \}$$
After standardizing,
$$X_{sd} = \{-1.3887301, -0.9258201, -0.46291, 0.0, 0.46291, 0.9258201, 1.3887301 \}$$
$$Y_{sd} = \{-1.3887301, -0.9258201, -0.46291, 0.0, 0.46291, 0.9258201, 1.3887301 \}$$
$$\Rightarrow Y_{sd} = X_{sd}$$
So, in this example, if we standardize the variables, the desired slope (i.e., the effect size) vanishes!!!

Best Answer

If you transform the target variable, you would need to back-transform the predictions to get the predicted values on the appropriate scale. Since linear regression is a linear model, in your example this is the same as transforming the $\beta$ parameter.

> x <- c(-3,-2,-1,0,1,2,3)
> y <- 2 * x
> xz <- x / sd(x)
> yz <- y / sd(y)
> lm(yz~xz)

Call:
lm(formula = yz ~ xz)

Coefficients:
(Intercept)           xz  
          0            1  

> predict(lm(yz~xz)) * sd(y)
 1  2  3  4  5  6  7 
-6 -4 -2  0  2  4  6
> coef(lm(yz~xz))[2] * sd(y) * xz
[1] -6 -4 -2  0  2  4  6

So nothing is wrong, you scaled the data and got the scaled parameters and predictions.

You don't need to standardize the data by default. There are scenarios when you would need to do this, e.g. using polynomials, as mentioned in the linked post, or when using regularization, or for some models other than linear regression.

Related Question