Solved – What’s the difference between standardized and unstandardized coefficients in linear regression models

regressionstandardization

When doing a linear regression model, one calculates the coefficients called Beta and B, they are the values that display how much a dependent variable changes, depending on a predictor (or independent) variable.

What's the difference between standardized and unstandardized coefficients in linear regression models?

Best Answer

I guess you mean the following. Consider the simple linear regression model: $$Y_i = \beta_0 + \beta_1 X_i + \epsilon_i \qquad \epsilon_i \stackrel{d}{=} N(0,\sigma^2)$$.

One would estimate the $\beta_1$ with an Least-squares estimator $\hat \beta_1$, which expresses the expected amount $Y_i$ changes when $X_i$ increases by one.

This usually works (and is the easiest in terms of interpretation) but you could fit the model in some other ways

First method - centering the variables

You could center the variable, for instance use $X_i^\ast = X_i - \overline X$ and fit the model once more, this would result in $$Y_i = \beta_0^\ast + \beta_1^*X_i^* + \epsilon_i^* \qquad \epsilon_i^\ast \stackrel{d}{=} N(0, \sigma^2)$$

then $\beta_1^\ast$ should be read as the expected increase in $Y_i$ when $X_i$ increases one from the mean. Using centered variables usually improves the model (it reduces multicollinearity). Also the value of $\beta_0^\ast$ is always usefull, it is the expected value of $Y_i$ hen $X_i = \overline X$.

Second method - standardizing the variables

You could standardizing the variables as wel, then $X_i^\ast = \dfrac{X_i-\overline X}{s_{X}}$, and refit the model.

The model then gives you the expected change in $Y_i$ when $X_i$ increases one standard deviation.

This has some numerical advantages.

Third method - standardizing both the variables and the outcome

Must usefull in multiple linear regression. Say you want a model like: $$Y_i = \beta_0 + \beta_1 X_{i1}+ \beta_2 X_{i2} + \epsilon_i \qquad \epsilon_i \stackrel{d}{=} N(0,\sigma^2)$$

How would you compare $\beta_1$ to $\beta_2$, in other words, which variable $X_1$ or $X_2$ has the most effect?

You could standardize the variables and the $Y$ values with $Y_i^\ast = \dfrac{Y_i - \overline Y}{s_Y}$ and the predictors as before. The $\beta_i^\ast$ which is the greatest (in absolute value) would point to the predictor which has the biggest effect.