I'm trying out Ridge and Lasso for feature selection step in machine learning.
I have below training data:
input_cap output_cap cpu disk load ips
2016-07-01 1.34 4.43 18.1 11 2.75 4863
2016-07-02 1.41 4.56 14.5 11 2.71 4616
2016-07-03 1.37 4.43 16.8 11 2.68 4440
2016-07-04 1.26 3.91 14.0 10 2.77 4047
2016-07-05 1.39 4.68 16.2 11 2.70 4720
and below test data:
input_cap output_cap ips
2017-04-01 1.93 7.21 10077
2017-04-02 1.91 7.97 10840
2017-04-03 2.06 9.86 12768
2017-04-04 2.09 10.55 13896
2017-04-05 2.04 7.28 12756
I did the following (ignore datetime index):
# Split training into x_train and y_train
x_train = train.iloc[:, [0,2,3,4,5]]
y_train = train.output_cap
# Split test data into x_test and y_test
x_test = test.ips
y_test = test.output_cap
# I want to find out which feature is most important in predicting
output_cap. So, I did the following:
# Standardize x_train
scaler = StandardScaler().fit(x_train)
x_train = scaler.transform(x_train)
from sklearn.linear_model import Ridge, Lasso
# Fit x_train and y_train
ridge = Ridge().fit(x_train, y_train)
lasso = Lasso().fit(x_train, y_train)
# Print results for both Ridge and Lasso
print('Ridge: ', ridge.coef_)
print('Lasso:', lasso.coef_)
Ridge: [-0.13489306 0.33747024 0.37065464 0.27221361 0.94848913]
Lasso: [ 0. 0. 0. 0. 0.15643667]
# From the results, it shows that feature 'ips' is most significant in
# predicting output_cap.
This is where I'm confused…
Let's assume that I want to use a LinearRegression() function from Sklearn, for simplicity, to predict output_cap_yhat from the test data using feature = 'ips'. My questions are:
- Do I need to first convert the standardized x_train back to its original scale, then fit the LinearRegression() and predict?
OR
-
Do I need to standardize the x_test, then fit the LinearRegression()?
-
How do I convert the standardized data back into non-standardized (original) data?
Best Answer
Since noisy magnitudes of variables may affect Lasso models, it would be beneficial to standardize the data. Here's a very helpful answer on this very topic: https://stats.stackexchange.com/a/86435/156469
While standardizing the variables ($z=\frac{x-\mu}{\sigma}$) you would get means ($\mu$) and standard deviations ($\sigma$). Using that you can back-convert the predicted values by: $x=z*\sigma+\mu$.