Solved – GLMnet – “Unstandardizing” Linear Regression Coefficients

glmnetstandardization

When performing linear regression, GLMnet apparently standardizes the dependent variable ($y$) vector to have unit variance before it runs the regression, and then unstandardizes the resulting intercept and coefficients. I assume the standardization is achieved by dividing each $y_i$ by the standard deviation of the $y$ vector.

If I run glmnet with a pre-standardized $y$ how do I unstandardize the resulting equation?

(Note that I am currently running my program/GLMnet on pre-standardized x variables, so I don't have to worry about reversing the x variable standardization that GLMnet also performs.)

I thought that I could simply unstandardize by multiplying each coefficient and the intercept by the standard deviation of the $y$ vector. This does not work – the "unstandardized" equation does not match the result I get when I run glmnet with the same non-standardized $y$. The only time multiplying by the standard deviation works is when I run glmnet with lambda=0. (This effectively runs the program as an ordinary least squares fit.)

I am recreating glmnet in another language as an exercise. When I run my program and glmnet on pre-standardized $y$, I get the same result. I do not get the same result when $y$ is not pre-standardized.

My information on standardization comes from the glmnet vignette:

"Note that for “family=gaussian”, glmnet standardizes y to have unit variance before computing its lambda sequence (and then unstandardizes the resulting coefficients); if you wish to reproduce/compare results with other software, best to supply a standardized y first (Using the “1/N” variance formula)."

Best Answer

This is mostly a case of carefully working out the math. I'll handle the two predictor + intercept case, it should be clear how to generalize it.

The standardized elastic net model results in the following relationship:

$$\frac{y - \mu(y)}{\sigma(y)} = \beta_1 \frac{x_1 - \mu(x_1)}{\sigma(x_1)} + \beta_2 \frac{x_1 - \mu(x_1)}{\sigma(x_1)}$$

If you very carefully move terms around until only $y$ is on the left hand side, you'll get

$$ y = \frac{\beta_1 \sigma(y)}{\sigma(x_1)} x_1 + \frac{\beta_2 \sigma(y)}{\sigma(x_2)} x_2- \left( \frac{\beta_1 \mu(x_1)}{\sigma(x_1)} + \frac{\beta_2 \mu(x_2)}{\sigma(x_2)} \right) \sigma(y) + \mu(y) $$

which gives the relationship between the standardized and unstandardized coefficients.

Here's a quick demonstration you can test this with

X <- matrix(runif(100, 0, 1), ncol=2)
y <- 1 -2*X[,1] + X[,2]

Xst <- scale(X)
yst <- scale(y)

enet <- glmnet(X, y, lambda=0)

enetst <- glmnet(Xst, yst, lambda=0)
coef <- coefficients(enetst)

# Un-standardized betas
coef[2]*sd(y)/sd(X[,1]) # = -2
coef[3]*sd(y)/sd(X[,2]) # = 1

# Unstandardized intercept (= 1)
-(coef[2]*mean(X[,1])/sd(X[,1]) + coef[3]*mean(X[,2])/sd(X[,2]))*sd(y) + mean(y)

Related Solutions

Solved – Reported Coefficients for Glmnet using Caret

Caret will fit the final model using glmnet again, so it reports the coefficients in the same way as glmnet, which is in the scale of the original data:

library(mlbench)
library(caret)
library(glmnet)
data(BostonHousing)

mymodel = train(medv ~ .,data=BostonHousing,
method="glmnet",tuneLength=5,family="gaussian",
trControl=trainControl(method="cv",number=3))

coef(mymodel$finalModel, mymodel$bestTune$lambda)

                        1
(Intercept)  35.320709389
crim         -0.103881511
zn            0.043895667
indus         0.003208220
chas1         2.711134571
nox         -16.888148979
rm            3.839322105
age           .          
dis          -1.440898136
rad           0.276505032
tax          -0.010852819
ptratio      -0.938477290
b             0.009195566
lstat        -0.521371464

gmodel = glmnet(x=as.matrix(BostonHousing[,-14]),y=BostonHousing[,14],
lambda=mymodel$bestTune$lambda)

                   s0
crim     -0.098276800
zn        0.041402890
indus     .          
chas      2.680135523
nox     -16.309105862
rm        3.862803869
age       .          
dis      -1.395580453
rad       0.253522033
tax      -0.009853769
ptratio  -0.930332033
b         0.009020162
lstat    -0.522732773

Solved – Interpretation of regression coefficients when dependent variable is standardized

If $X_i$ is continuous, you can easily interpret $\beta^g$ as the effect of a change of one unit of $X_i$ on the standard deviation of $Y$ (or on the standardized score). This is a valid approach which allows you to directly assess the magnitude of the relationship between $X_{it}$ on $Y_i$. This same applies to the case in which $X_i$ is categorical.
Does this kind of standardization facilitate a comparison of coefficients on $X_{it}$ directly? No, if you want to directly compare different independent variables (e.g. measured in different units), you have to standardize these variables too (if $X_i$ is continuous, if $X_i$ is categorical this is no problem since it is just the effect of moving from one category to another). You can directly compare the effect on $Y_{it}^k$ compared to say $Y_{it}^l$ if $Y^k$, $Y^l$ differ for example in the unit measured.

Added in response to the comment / updated 2.): Regarding your updated question: You do not take the group-specific mean or variance into account if you standardize the dependent variable by subtracting it's mean ($\bar{Y} = \frac{1}{N\cdot T}\sum_i\sum_t Y_{it}$) and dividing by it's standard deviation. The group-specific means will still differ. If all group-specific means should be zero (and all standard deviations to be 1), you have to take into account the group-specific mean and variance ($\bar{Y}_i = \frac{1}{T}\sum_t Y_{it}$).

However, you assume that all effects are linear. This means that the effect does not depend on the level of the $Y$ (remember standardization should ease the interpretation of your results but the basic assumptions are the same). If you think that this might be violated, you should consider re-thinking your model (e.g. checking this assumption with quantile regression etc.) and looking and non-linear models. Standardization will not help you in this case.

There may be of course all the problems related to linear regression models or longitudinal data in general. But this is a valid approach. You may think of standardizing $X_{it}$ too to ease interpretation (see 2.).

Best Answer

Related Solutions

Solved – Reported Coefficients for Glmnet using Caret

Solved – Interpretation of regression coefficients when dependent variable is standardized

Related Question