Solved – Why does scaling the features affect the prediction of a regression

machine learningpythonregressionsvm

I'm working on a regression problem using the support vector regression model from sklearn and using MinMax to scale the features, but by using it I get a different result for the regression, does that makes sense?

import pandas as pd
import numpy as np
from sklearn import  svm
from sklearn.preprocessing import MinMaxScaler

np.random.seed(0)
X_training = np.random.rand(100,15)*10
Y_training = np.random.rand(100,1)*10
model = svm.SVR()

without scaling:

model.fit(X_training,Y_training)
print model.predict(X_training)[0:10]

array([ 4.99980599,  6.99479293,  4.9784396 ,  5.03911175,  6.99557904,
        6.57214885,  6.99454049,  5.60940831,  6.99989978,  5.98628179])
Using MinMax scaler:

scaler = MinMaxScaler()
X_scaled  = scaler.fit_transform(X_training)
model.fit(X_scaled,Y_training)
model.predict(X_scaled)[0:10]

array([ 5.63521939,  6.70378514,  5.83393228,  5.33274858,  6.47539108,
        5.61135278,  5.7890052 ,  5.74425789,  6.15799404,  6.1980326 ])

Although the prediction is in the same order of magnitude there is a significant difference between both cases.

Best Answer

Regularization encompass techniques aimed at restricting model complexity. The Support Vector Machine is usually $\ell_2$-regularized, except for the intercept term, which brings coefficients asymptotically towards zero, as the cost function is amended with $\|w\|_2^2=\sum_{i=1}^pw_i^2$.

As you can see, the regularization penalty actually depends on the magnitude of the coefficients, which in turn depends on the magnitude of the features themselves. So there you have it, when you change the scale of the features you also change the scale of the coefficients, which are thus penalized differently, resulting in diverging solutions.