I am comparing the mean squared error (MSE) from a standard OLS regression with the MSE from a ridge regression. I find the OLS-MSE to be smaller than the ridge-MSE. I doubt that this is correct. Can anyone help me finding the mistake?
In order to understand the mechanics, I am not using any of Matlab's build-in functions.
% Generate Data. Note the high correlation of the columns of X.
X = [3, 3
1.1 1
-2.1 -2
-2 -2];
y = [1 1 -1 -1]';
Here I set lambda = 1, but the problem appears for any value of lambda, except when lambda = 0
. When lambda = 0
, the OLS and the ridge estimates coincide, as they should.
lambda1 = 1;
[m,n] = size(X); % Size of X
OLS estimator and MSE:
b_ols = ((X')*X)^(-1)*((X')*y);
yhat_ols = X*b_ols;
MSE_ols = mean((y-yhat_ols).^2)
Ridge estimator and MSE:
b_ridge = ((X')*X+lambda1*eye(n))^(-1)*((X')*y);
yhat_ridge = X*b_ridge;
MSE_ridge = mean((y-yhat_ridge).^2)
For the OLS regression, MSE = 0.0370
and for the ridge regression MSE = 0.1021
.
Best Answer
That is correct because $b_{OLS}$ is the minimizer of MSE by definition. The problem ($X^TX$ is invertible here) has only one minimum and any value other than $b_{OLS}$ will have higher MSE on the training dataset.