Solved – Why is Lasso and Ridge not giving better results than OLS

elastic netlassoleast squaresridge regression

I am trying to find an example in which Lasso and Ridge regression are doing better than simple OLS.

I am trying to run the Boston example that appears in the MASS library in R. The dependent variable is medv (median house price).

I'll jump to the end. Ridge, Lasso and Elastic net, are all giving similar results to OLS. I can't understand why. Moreover, I found an example here:

http://rstudio-pubs-static.s3.amazonaws.com/257535_4218def5fb0945a7a5c09126f385aa59.html

In which the MSE is much smaller than mine ! Can you kindly help me find what is wrong with my code ?

set.seed(1)
cv.out = cv.glmnet(X.train,y.train, alpha = 0, nfolds = 10, lambda = seq(0, .1, length = 15)) #cv.glmnet will create it's own lambda sequency by default
plot(cv.out)
best.lambda = cv.out$lambda.min
best.lambda
log(best.lambda)

ridge.model = glmnet(X.train, y.train, alpha = 0, nlambda = 100)
ridge.pred = predict(ridge.model, s = best.lambda, newx = X.test)
ridge.testmse = mean((y.test - ridge.pred)^2)
predict(ridge.model, type = "coefficients", s = best.lambda)
ridge.testmse

My code for Lasso and elastic net is very similar. Thank you.

Best Answer

I think a quote from the accuracy section in the example you linked can be very informative (emphasis added):

OLS is ideal when the underlying relationship is Linear and we have n>>p. But if n is not much larger than p or p>n (unfeasible for OLS), there can be a lot of variability in the fit which can result in either overfitting and very poor predictive ability.

The Boston dataset has 506 observations of 14 variables. Therefore, we are in the case n>>p where OLS is said to be ideal. In other words, you don't have the problem that Lasso is intended to solve.

Furthermore, variable selection (like in Lasso) is useful when some predictors are not significant or predictors are highly correlated. In the Boston dataset most predictors are highly significant, correlation between them is moderate and VIFs aren't exaggerate. Thus, other circumstances that could make variable selection useful just don't hold here.