Solved – R – xgbTree, xgbDart and gbm in Caret predict Small Range


I am currently facing an issue in my regression model. I have tried the models in the title (xgbTree, xgbDART, and gbm in caret), and they tend to predict a very small range for the output variable.

The y variable in my train data goes from 2 to 6, and my predicted variable tends to go from 3 to 4.5. I have about 500 samples and 4 predictor variables.

I have also tried very big and very small values for the relevant hyper parameters. I have tried for example:

max_depth : (1, 2, 4, 8, 16, 32)

nrounds: (10, 50, 100, 200, 500, 1000, 2000)

eta: (0.001, 0.01, 0.1, 1)

gamma: (0.0001, 0.001, 0.01, 0.1, 1, 10)

And similar for the other hyperparameters (and for gbm, the relevant ones). My point being that I have tried the extremes of few, deep trees and many, shallow trees. In addition, the hyperparameter grid includes various levels of column sampling and subsampling and various different ranges of minimum number of instances in terminal nodes.

I am doing standard train-test splitting with 10-fold CV for each of the models I tried.

Is my problem just unlearnable?

edit: Forgot to mention. I tried the above with the aformentioned models as stand alone. I have also tried a set up with these models along with other, simpler models as the first tier in a stack ensemble, with a random forest as the second and final layer. Results were the same either way, except the other models in the first layer like xgbLinear and quantile random forest had an easier time covering a bigger range. The overall ensemble still had the small prediction width issue that xgbTree, xgbDART and gbm had.

Best Answer

In most situations, predictions have smaller variance than the response.

More precise: The insample R squared is defined as the variance of the predictions divided by the variance of the response. If the R-squared is smaller than 1, then also your predictions will have less variability than the response. In the worst case, the model always predicts the same value. In this case, the variance of the predictions is 0.