Lasso Centering and Standardization – How to Perform Lasso Centering and Standardization with R

rstandardization

I am working with a lasso regression with the glmnet package. I read these threads: When conducting multiple regression, when should you center your predictor variables & when should you standardize them?, Need for centering and standardizing data in regression and Is standardisation before Lasso really necessary?.

Based on the responses I decided that I need to standardize my data before using it. I do have some questions however:

  • Do I need to standardize the predictors and the responses or only the predictors?
  • I am using the function scale(myData, center = TRUE, scale = TRUE) for building the model, but I am wondering what do I do when I want to do predictions with a test data set. I think I should also standardize and center the test data, but how to I do that? Substracting the mean from the initial (training) dataset and the dividing it by the standard deviation of the initial dataset?
  • When I get a result do I need to "backscale" it (using the original mean and standard deviation) or do I already get the "final" result?

Best Answer

If you use glmnet, the scaling is performed by the package. You don't need to worry about scaling the test set because the "coefficients are always returned on the original scale".

By default:

glmnet(x, y, [...]
standardize = TRUE,
intercept = TRUE,
standardize.response = FALSE [...])

As for the standardization of the response, it should not change the performance of your model after cross validating over $\lambda$ so you can set standardize.response = FALSE

Indeed the LASSO solves

$$ \min_\beta\; \| Y - X\beta \|^2_2 + \lambda \|\beta\|_1 $$

Scaling $Y$ by a factor $\alpha > 0$, the problem becomes

$$ \min_\beta\; \| \alpha Y - X\beta \|^2_2 + \lambda \|\beta\|_1 $$

which is equivalent to

$$ \min_\beta\; \alpha \| Y-X\beta/\alpha \|^2_2 + \lambda \|\beta\|_1 $$

$$ \min_\beta\; \| Y - X\beta/\alpha \|^2_2 + \lambda \|\beta/\alpha\|_1 $$

So it has the same value of $\lambda$