I am trying to find an example in which Lasso and Ridge regression are doing better than simple OLS.
I am trying to run the Boston example that appears in the MASS library in R. The dependent variable is medv (median house price).
I'll jump to the end. Ridge, Lasso and Elastic net, are all giving similar results to OLS. I can't understand why. Moreover, I found an example here:
http://rstudio-pubs-static.s3.amazonaws.com/257535_4218def5fb0945a7a5c09126f385aa59.html
In which the MSE is much smaller than mine ! Can you kindly help me find what is wrong with my code ?
set.seed(1)
cv.out = cv.glmnet(X.train,y.train, alpha = 0, nfolds = 10, lambda = seq(0, .1, length = 15)) #cv.glmnet will create it's own lambda sequency by default
plot(cv.out)
best.lambda = cv.out$lambda.min
best.lambda
log(best.lambda)
ridge.model = glmnet(X.train, y.train, alpha = 0, nlambda = 100)
ridge.pred = predict(ridge.model, s = best.lambda, newx = X.test)
ridge.testmse = mean((y.test - ridge.pred)^2)
predict(ridge.model, type = "coefficients", s = best.lambda)
ridge.testmse
My code for Lasso and elastic net is very similar. Thank you.
Best Answer
I think a quote from the accuracy section in the example you linked can be very informative (emphasis added):
The Boston dataset has 506 observations of 14 variables. Therefore, we are in the case n>>p where OLS is said to be ideal. In other words, you don't have the problem that Lasso is intended to solve.
Furthermore, variable selection (like in Lasso) is useful when some predictors are not significant or predictors are highly correlated. In the Boston dataset most predictors are highly significant, correlation between them is moderate and VIFs aren't exaggerate. Thus, other circumstances that could make variable selection useful just don't hold here.