Solved – Poor predictions from lm.ridge

I don't understand why predictions from lm.ridge() are so far out, when using the "best" lambda, based upon GCV. Can anyone help me to obtain better predictions? Or, at least, does anyone have a good ridge example with a simple explanation of the results?

Below is my R code for wine quality data (from http://archive.ics.uci.edu/ml/machine-learning-databases/wine-quality/winequality-red.csv):

library(MASS)
wine_all <- read.table("http://archive.ics.uci.edu/ml/machine-learning-databases/wine-quality/winequality-red.csv", sep=";", header = T)
#wine_all <- read.table("winequality-red.csv", sep=";", header = T)
wine_train <- wine_all[1:1400,]
wine_test <- wine_all[-(1:1400),]

train.lm <- lm.ridge(quality~., wine_train, lambda = seq(0, 100, 0.1))
plot(x=train.lm$lambda, y=train.lm$GCV)

pred.test <- scale(wine_test[,1:11], center = F, scale = train.lm$scales) %*% train.lm$coef[, which.min(train.lm$GCV)] + train.lm$ym

pred.all <-  scale(wine_all[,1:11], center = F, scale = train.lm$scales) %*% train.lm$coef[, which.min(train.lm$GCV)] + train.lm$ym
cor(wine_test[, 12], pred.test)^2
cor(wine_all[, 12], pred.all)^2

Best Answer

The results are comparable to those of a standard regression. I guess this is not a situation in which ridge regression improves the estimators: the number of variables is small compared to the number of observations, and there are no colinearity problems.

pairs(wine_train,gap=0)
d <- cbind(
  wine_test[, 12], pred.test, 
  predict( lm(quality~., wine_train), wine_test )
)
pairs(d, gap=0 )
cor(d)

The scale of your forecasts is wrong, however: you need center=TRUE.

Best Answer

Related Solutions

Solved – Ridge regression results different in using lm.ridge and glmnet

Solved – How to calculate predicted values using an lm.ridge object

Related Question