Solved – Have difficulty understanding Matlab’s Ridge regression

machine learningMATLABregressionregularizationridge regression

I am confused by Matlab's documentation of Ridge regression at http://www.mathworks.com/help/stats/ridge-regression.html and couldn't figure it out by myself.

On that page, the Introduction to Ridge Regression part all look good to me. However, in the following example, why do we need the line D = x2fx(X,'interaction');? It seems to map the features (x1, x2, x3) to 2-degree polynomial space (x1, x2, x3, x1x2, x1x3, x2x3) and then do regression on it. If I want to train on the original features, should I just use [x1, x2, x3] instead of D?

And what is the right way to interpret the "ridge trace" there? I saw that as the ridge parameter k goes up, the absolute value of coefficients learned decreases and converges to two groups. But if I use [x1, x2, x3] instead of D, I could not observe similar trends.

Finally, to use the parameters learned to predict new data, should I just call ytest = Xtest * betahet on a centered and normalized matrix Xtest with mean = 0 and stddev = 0?

Thanks in advance!

Best Answer

There's no reason you can't use ridge regression on the linear model. I'm guessing the example shows the interaction model because there is higher collinearity there (compare corr(X) with corr(D)) so the effect of the ridge regression is more pronounced. For the linear model you would have to choose much larger values of the ridge parameter to see substantial shrinkage.

To make predictions, you'll need to apply the centering and scaling parameters that were computed from the training data, not standardize the test data separately. If you type "help ridge" you should see instructions for computing a coefficient vector B0 that can be applied directly to the test data without re-scaling.