Solved – Have difficulty understanding Matlab’s Ridge regression

machine learningMATLABregressionregularizationridge regression

I am confused by Matlab's documentation of Ridge regression at http://www.mathworks.com/help/stats/ridge-regression.html and couldn't figure it out by myself.

On that page, the Introduction to Ridge Regression part all look good to me. However, in the following example, why do we need the line D = x2fx(X,'interaction');? It seems to map the features (x1, x2, x3) to 2-degree polynomial space (x1, x2, x3, x1x2, x1x3, x2x3) and then do regression on it. If I want to train on the original features, should I just use [x1, x2, x3] instead of D?

And what is the right way to interpret the "ridge trace" there? I saw that as the ridge parameter k goes up, the absolute value of coefficients learned decreases and converges to two groups. But if I use [x1, x2, x3] instead of D, I could not observe similar trends.

Finally, to use the parameters learned to predict new data, should I just call ytest = Xtest * betahet on a centered and normalized matrix Xtest with mean = 0 and stddev = 0?

Thanks in advance!

Best Answer

There's no reason you can't use ridge regression on the linear model. I'm guessing the example shows the interaction model because there is higher collinearity there (compare corr(X) with corr(D)) so the effect of the ridge regression is more pronounced. For the linear model you would have to choose much larger values of the ridge parameter to see substantial shrinkage.

To make predictions, you'll need to apply the centering and scaling parameters that were computed from the training data, not standardize the test data separately. If you type "help ridge" you should see instructions for computing a coefficient vector B0 that can be applied directly to the test data without re-scaling.

Related Solutions

Solved – Confused by MATLAB’s implementation of ridge

This is a matlab program to validate what cardinal said, it is actually due to the centering and scaling

% Create A(10 by 3 matrix) and b(10 by 1 matrix)
A=rand(10,3);
b=rand(10,1);
lambda=0.01
% centering and scaling A 
s=std(A,0,1);
s=repmat(s,10,1);
A=(A-repmat(mean(A),10,1))./s;

%check the result
X1=inv(A'*A+eye(3)*lambda)*A'*b;
X2=ridge(b,A,lambda,1);

x1 then equal x2

Solved – Extremly poor polynomial fitting with SVR in sklearn

In short, you need to tune your parameters. Here's the sklearn docs:

The free parameters in the model are C and epsilon.

and their descriptions:

C : float, optional (default=1.0)

Penalty parameter C of the error term.

epsilon : float, optional (default=0.1)

Epsilon in the epsilon-SVR model. It specifies the epsilon-tube within which no penalty is associated in the training loss function with points predicted within a distance epsilon from the actual value.

It looks like you have an under-penalized model, it is not punished harshly enough for straying away from the data. Let's check.

I generated some polynomial data that is on approximately the same scale as yours:

xs = np.linspace(0, 1, 100)
ys = 400*(xs - 2*xs*xs + xs*xs*xs) - 20
scatter(xs, ys, alpha=.25)

CubicSVR

And then fit the SVR with the default parameters:

clf = SVR(degree=3)
clf.fit(np.transpose([xs]), ys)
yf = clf.predict(numpy.transpose([xs]))

Which gives me essentially the same issue as you:

UnderPenalizedSVR

Using the intuition that the parameters are under-penalizing the fit, I adjusted them:

clf = SVR(degree=3, C=100, epsilon=.01)

Which gives me a pretty good fit:

GoodFitSVM

In general, whenever your model has free parameters like this, it is very important to tune them carefully. sklearn makes this as convenient as possible, it supplies the grid_search module, which will try many models in parallel with different tuning parameters and choose the one that best fits your data. Also important is getting the measurement of best fits your data correct, as the model fit measured using the training data is not a good representation of the model fit on unseen data. Use cross validation or a sample of held out data to examine how well your model fits. In your case, I would recommend using cross validation with GridSearchCV.

Best Answer

Related Solutions

Solved – Confused by MATLAB’s implementation of ridge

Solved – Extremly poor polynomial fitting with SVR in sklearn

Related Question