I have recently been doing work with predictive models for a continuous response. I am doing a comparison between Elastic Net (glmnet
) package in R and XGBoost (xgboost
) package in R. Originally, I built the model using Elastic Net for its ability to perform feature selection and also for its ability to shrink the coefficeints of correlated variables.
I am exploring XGBoost because of its predictive capabilities, the summary of feature importance it provides, its ability to capture nonlinear interactions and also because I believe that it might be more robust in the presence of outliers.
My questions are:

Is XGBoost or gradient boosted trees in general better at finding nonlinear interactions than a generalized linear model?

Is my assumption about XGBoost or gradient boosted trees in general being robust to outliers a fair assumption?
Here is my model set up and finding:
For model validation I have a training and testing set. I $log$ transform the response variables before model fitting. I make predictions on the testing set and then exponentiate the results to return to the original scale. I make predicted vs. observed plots for each model.
XGBoost Predicted Vs. Observed Plot
The exponentiated predicted values have some outliers but the fit in general is good.
Elastic Net Predicted Vs. Observed Plot
With the elastic net model when I convert back to original scale there is an extreme predicted value. I am interpreting this as that the GLM NET has a few cases that it is not quite sure how to predict (outliers).
I would love to hear opinions! Thank you in advance for any help or comments!
Best Answer
1 Yes boosted trees would more easily fit unknown nonlinear effects or interactions than regularized linear regression. However, as soon as you are aware of some specific nonlinearity you could simply transform data to linearity and continue to use a linaer learner.
2 That depends on how you train the models. If you're new to boosted trees, check out some tutorials on how to avoid overfitting. I cannot see from your plots what kind of crossvalidation was used. Use a thorough outer crossvalidation and perhaps compare the results to a random forest model. RF models are much easier to handle, and default settings are often near optimal. A crude thumb rule; if your RF performs better than boosted trees(measured by outer cross validation), you either have chosen sub optimal training paramters for your boosted trees model or your data is quite noisy.