Can we apply the concept of ridge regression in random forest for predicting the values in order to get more accurate results?
Random forest using regression trees for the prediction. When there is a problem of multicollinearity we will use ridge regression. Multicollinearity definitely can affect variable importances in random forest models. To overcome those multicollinearity in random forest can we use the concept of ridge regression?
Best Answer
For predictive accuracy, I would not expect multicollinearity to be a problem for random forests. For variable importances, it is much more likely to be a problem.
Combining random forests and penalized (e.g., ridge) regression can be done with R package pre. This package fits prediction rule ensembles, by first fitting a tree ensemble (bagged, boosted and/or random forest) and then selecting the best nodes through penalized regression (lasso, ridge or elastic net). In the following example, we fit a random forest and prediction rule ensemble on the airquality data. In this dataset, there is a substantial (negative) correlation between Temp and Wind and substantial (positive) correlation between Temp and Month:
Now we fit a random forest and a prediction rule ensemble (taking a random forest + ridge regression approach through specification of the mtry and alpha arguments, respectively):
Now we request and plot the variable importances:
We see that the variables Temp, Wind and Solar.R have very similar relative importances in the RF and PRE. The relative importances of Day and Month are lower in the PRE than in the RF.