Solved – Why is least squares performing as well as ridge regression when there is multicollinearity

least squaresMATLABmulticollinearityridge regression

I am learning about ridge regression, so I am implementing it in MATLAB as practice. However, I am having trouble finding a structure of data where ridge regression performs better than an ordinary least squares.

Reading up I've found that data that is collinear is often better to be regularized. However when I implemented this in the below code least squares is performing just a well as ridge regression (the best lambda parameter is in the range e-10, almost non-existent!). MATLAB tells me that X is rank deficient (rank=2) when using the built in function for least squares, however it still performs well?

I was wondering if anyone knew why this was performing this way, is my data perhaps not collinear enough to show a real performance difference, or have I misunderstood something?

% Generate data;
clear;
Nt = 100;
X(:,1) = randn(Nt,1);
X(:,2) =  2*X(:,1) + 6;
X(:,3) = 12*X(:,2) + 16;
p=[0.74,3,4.5];
y = X*p' + randn(Nt,1);

% Least Squares;
pLS = X\y;
%pLS = pinv(X'*X)*(X'*y);
nmseN =  sum((X*pLS-y).^2)/length(y)/var(y);

% Tikhonov;
lspace     = logspace(-10,-1,1000);
bestNMSE   = inf;
bestLambda = -1;
I=eye(size(X, 2));
for lambda=1:length(lspace)
  prLS = pinv(X'*X + lspace(lambda)*(I'*I))*(X'*y);

  nmse = sum((X*prLS-y).^2)/length(y)/var(y);
  if nmse<bestNMSE
    bestNMSE=nmse;
    bestLambda=lspace(lambda);
  end
end
prLS = pinv(X'*X + bestLambda*(I'*I))*(X'*y);
nmseR =  sum((X*prLS-y).^2)/length(y)/var(y);

Best Answer

To echo Cardinal's comment, ols will always perform the best of any linear method at describing a given data set. Per se, that's the definition of ols. The point of regularized regression is to improve prediction accuracy. For an example of how regularization (and other techniques) can improve predictive accuracy, you could have a look at "Introduction to Statistical Learning with R". In the chapter(6) in which regularized regression is introduced, the use the 'hitters' data set to show various models that are better at prediction than ordinary regression. Chapter 6, labs 1 and 2 go through various methods.