Solved – How to select a comprehensive set of parameters for Hyper-parameter tuning Extra Trees Regressor / Random Forest Regressor

hyperparametermachine learningrandom forestregression

I'm trying to use as much parameters as I can in hyper-parameter tuning of Extra Trees Regressor and Random Forest Regressor, so I'll be sure on the model I'm going to use.

The parameters in Extra Trees Regressor are very similar to Random Forest. I get some errors on both of my approaches. I know some of them are conflicting with each other, but I cannot find a way out of this issue.

Here is the parameters I am using for extra trees regressor (I am using GridSearchCV):

​from sklearn.model_selection import GridSearchCV
param_grid = {
    'n_estimators': [10,50,100],
    'criterion': ['mse', 'mae'],
    'max_depth': [2,8,16,32,50],
    'min_sample_split': [2,4,6],
    'min_sample_leaf': [1,2],
    #'oob_score': [True, False],
    'max_features': ['auto','sqrt','log2'],    
    'bootstrap': [True, False],
    'warm_start': [True, False],
}

So, as far as I know, oob_score, does not work with Bootstrap='false' and that's obvious. But there is a problem between max_depth, min_sample_split and min_sample_leaf that I cannot figure out! I guess the depth of the tree should be a minimum to allow certain splits to happen, or the other way around(?).

Also, would 'min_weight_fraction_leaf', mean_impurity be useful? Especially the latter one.

BTW, this is how I create the model:

from sklearn.ensemble import ExtraTreesRegressor
model = ExtraTreesRegressor ()

And this is how I run the GridSearchCV + Crossval:

gcv = GridSearchCV(model,param_grid,cv=5,n_jobs=-1).fit(XTrain,yTrain.values.ravel())

Thanks!

Best Answer

With 'n_estimators': [10,50,100], note that the number of trees has a new default start 0f 100. So you can have it as 'n_estimators': [int(x) for x in np.arange(start = 100, stop = 2100, step = 100)]