Solved – Why Ridge regression increase and not decrease the model’s error

lassoregressionregularizationridge regression

I am trying to optimize the linear regression using the regularisation of Ridge using glmnet function. But the problem is, instead of decreasing the error after using Ridge method, the error increase. This is weird because I think that Ridge regularisation is used to optimize fitting models and then minimize prediction error ! How do you explain this please ?

I share with you my code to see what I do exactly :

First, this is the data set:

> str(DATABASE)
'data.frame':   1667 obs. of  28 variables:
 $ YEAR_SALES            : num  2 1 2 2 1 1 1 1 1 1 ...
 $ MONTH_SALES           : num  9 9 2 9 3 3 11 12 3 6 ...
 $ DAY_SALES             : num  13 3 10 23 12 10 26 4 18 9 ...
 $ HOURS_INS             : num  17 14 18 16 23 18 16 12 17 16 ...
 $ CREATION_YEAR_SALES   : num  2 1 2 2 2 1 1 2 1 1 ...
 $ CREATION_MONTH_SALES  : num  9 9 2 10 12 3 11 2 3 6 ...
 $ CREATION_DAY_SALES    : num  13 11 15 31 5 10 27 7 18 9 ...
 $ VALIDATION_YEAR_SALES : num  2 1 2 2 2 1 1 2 1 1 ...
 $ VALIDATION_MONTH_SALES: num  9 9 2 11 12 3 12 2 3 6 ...
 $ VALIDATION_DAY_SALES  : num  15 14 16 3 6 19 1 8 21 10 ...
 $ AGE_CUSTUMER          : num  32 37 23 32 44 33 29 30 56 48 ...
 $ MEAN_Sales            : num  0 71 50 0 0 83 0 25 23 35 ...
 $ NBR_GIFTS             : num  1 1 1 1 1 1 1 1 4 3 ...
 $ TYPE_PEAU             : num  2 3 4 2 2 3 2 2 2 2 ...
 $ SENSIBILITE           : num  3 3 3 2 1 3 3 2 2 2 ...
 $ IMPERFECTIONS         : num  2 3 2 1 3 2 2 1 2 1 ...
 $ BRILLANCE             : num  3 1 1 3 3 3 3 3 3 3 ...
 $ GRAIN_PEAU            : num  3 3 3 3 1 3 1 1 1 3 ...
 $ RIDES_VISAGE          : num  1 1 1 3 3 3 3 1 3 1 ...
 $ ALLERGIES             : num  1 1 1 1 1 1 1 1 1 1 ...
 $ MAINS                 : num  2 3 3 3 2 2 2 2 2 2 ...
 $ PEAU_CORPS            : num  1 2 2 1 1 1 1 1 1 1 ...
 $ INTERET_ALIM_NATURELLE: num  1 3 3 1 3 1 1 1 3 1 ...
 $ INTERET_ORIGINE_GEO   : num  1 2 1 1 3 1 3 1 1 3 ...
 $ INTERET_VACANCES      : num  2 3 1 2 1 2 1 1 2 3 ...
 $ INTERET_ENVIRONNEMENT : num  1 3 3 3 3 1 1 1 1 1 ...
 $ INTERET_COMPOSITION   : num  1 1 1 3 3 1 1 1 1 1 ...
 $ OUTCOME               : num  3 4 7 3 3 6 3 9 26 17 ...

Then,splitting data and create the linear regression model:

> set.seed(123)
> smp_size <- floor(0.75 * nrow(DATABASE))
> train_ind <- sample(seq_len(nrow(DATABASE)),size =smp_size)
> 
> train <- DATABASE[train_ind, ]
> test <- DATABASE[-train_ind, ]
> reg<-lm(OUTCOME~.-1,data=train)

Finally, computing the prediction error:

> y.test<-test$OUTCOME
> NBR_Achat=predict(reg,newdata=test)
> round(sqrt(mean(((1-NBR_Achat/y.test)^2))),4)
[1] 0.4523

The above code is just for the simple case. Now let's see the ridge regression what does give:

y<-train$OUTCOME
x<-as.matrix(train[,1:27])
lambdas <- 10^seq(3,-2,by=-.1)
fit<-glmnet(x,y,alpha =0,lambda=lambdas)

To get the best model I use: It semms that best lambda equal to 0.1

> cv_fit <- cv.glmnet(x,y,alpha = 0,lambda=lambdas)
> plot(cv_fit)
> opt_lambda <- cv_fit$lambda.min
> opt_lambda
[1] 0.1

And finally, the prediction error is computed by the following code :

> x<-as.matrix(test[,1:27])
> y_predicted <- predict(cv_fit,s = opt_lambda,newx=x)
> y.test<-test$OUTCOME
> round(sqrt(mean(((1-y_predicted/y.test)^2))),4)
[1] 0.4605

What do you think about this??

Best Answer

In general, ridge regression won't necessarily improve the error. Recall that the goal of the regularization is to make a simpler model to avoid overfitting and thus better prediction on the independent set. However, if overfitting is not a problem (for example when there are much more samples than features), more complex model (less regularized) might predict better. Often models predict better when they are more complex and not less, which is why things like neural networks, random forests and kernels exist.

Traditional way to improve prediction is to look at your carefully at your data and think about what assumptions is your model making. Linear regression assumes that all your variables have linear effect and that there are no interactions between variables. So if oyu have a U shaped effect of some variable on the outcome or when variable A behaves differently for males and females, your model won't predict as well as it could.