Solved – Hyperparameter tuning on the whole data set reasonable

hyperparametermachine learningpythonrandom forest

It may be a weird question because I don't fully understand hyperparameter-tuning yet.

Currently I'm using gridSearchCV of sklearn to tune the parameters of a randomForestClassifier like this:

gs = GridSearchCV(RandomForestClassifier(n_estimators=100, random_state=42), param_grid={'max_depth': range(5, 25, 4), 'min_samples_leaf': range(5, 40, 5),'criterion': ['entropy', 'gini']}, scoring=scoring, cv=3, refit='Accuracy', n_jobs=-1)
gs.fit(X_Distances, Y)
results = gs.cv_results_

After that I check the gs object for the best_params and best_score. Now I'm using best_params to instantiate a RandomForestClassifier and use stratified validation again to record metrics and print a confusion matrix:

rf = RandomForestClassifier(n_estimators=1000, min_samples_leaf=7, max_depth=18, criterion='entropy', random_state=42)
accuracy = []
metrics = {'accuracy':[], 'precision':[], 'recall':[], 'fscore':[], 'support':[]}
counter = 0

print('################################################### RandomForest ###################################################')
for train_index, test_index in skf.split(X_Distances,Y):
    X_train, X_test = X_Distances[train_index], X_Distances[test_index]
    y_train, y_test = Y[train_index], Y[test_index]
    rf.fit(X_train, y_train)
    y_pred = rf.predict(X_test)

    precision, recall, fscore, support = np.round(score(y_test, y_pred), 2)
    metrics['accuracy'].append(round(accuracy_score(y_test, y_pred), 2))
    metrics['precision'].append(precision)
    metrics['recall'].append(recall)
    metrics['fscore'].append(fscore)
    metrics['support'].append(support)

    print(classification_report(y_test, y_pred))
    matrix = confusion_matrix(y_test, y_pred)
    methods.saveConfusionMatrix(matrix, ('confusion_matrix_randomforest_distances_' + str(counter) +'.png'))
    counter = counter+1

meanAcc= round(np.mean(np.asarray(metrics['accuracy'])),2)*100
print('meanAcc: ', meanAcc)

Is this a reasonable approach or do I have something completely wrong?

EDIT:

I just tested the following:

gs = GridSearchCV(RandomForestClassifier(n_estimators=100, random_state=42), param_grid={'max_depth': range(5, 25, 4), 'min_samples_leaf': range(5, 40, 5),'criterion': ['entropy', 'gini']}, scoring=scoring, cv=3, refit='Accuracy', n_jobs=-1)
gs.fit(X_Distances, Y)

This yields best_score = 0.5362903225806451 at best_index = 28. When I check the accuracies in the 3 folds at index 28 I get:

split0: 0.5185929648241207
split1: 0.526686807653575
split2: 0.5637651821862348

Which leads to the mean test accuracy: 0.5362903225806451. best_params: {'criterion': 'entropy', 'max_depth': 21, 'min_samples_leaf': 5}

Now I run this code which is using the mentioned best_params with a stratified 3 fold split (like GridSearchCV):

rf = RandomForestClassifier(n_estimators=100, min_samples_leaf=5, max_depth=21, criterion='entropy', random_state=42)
metrics = {'accuracy':[], 'precision':[], 'recall':[], 'fscore':[], 'support':[]}
counter = 0
print('################################################### RandomForest_Gini ###################################################')
for train_index, test_index in skf.split(X_Distances,Y):
    X_train, X_test = X_Distances[train_index], X_Distances[test_index]
    y_train, y_test = Y[train_index], Y[test_index]
    rf.fit(X_train, y_train)
    y_pred = rf.predict(X_test)

    precision, recall, fscore, support = np.round(score(y_test, y_pred))
    metrics['accuracy'].append(accuracy_score(y_test, y_pred))
    metrics['precision'].append(precision)
    metrics['recall'].append(recall)
    metrics['fscore'].append(fscore)
    metrics['support'].append(support)

    print(classification_report(y_test, y_pred))
    matrix = confusion_matrix(y_test, y_pred)
    methods.saveConfusionMatrix(matrix, ('confusion_matrix_randomforest_distances_' + str(counter) +'.png'))
    counter = counter+1

meanAcc= np.mean(np.asarray(metrics['accuracy']))
print('meanAcc: ', meanAcc)

The metrics dictionairy yields the exact same accuracies (split0: 0.5185929648241207, split1: 0.526686807653575, split2: 0.5637651821862348)

However the mean calculation is a bit off: 0.5363483182213101. With this approach I get the actual predictions of the best_estimator found by gridSearchCV. Now I can plot a confusion matrix for each fold to analyse. The productive model would be trained with my whole data set.

Best Answer

Gridsearch uses crossvalidation, if you take the best parameters you should be able to reproduce the best result, just be carefull to leave aside your test data and use it only at the end.

20-30 % test data is the usual.

Related Solutions

Solved – Validation set for hyperparameter tuning of ML time series model

I also came across this problem when working on a forecasting project.

First say you are doing a grid search of your hyper-parameters and you have a set of parameters you want to test.

Because this is a time series dataset, we want to always predict in the future. Now depending on how many "folds" you wish to do, you can compute the CV error like so:

pick a choice of hyperparameter
fit for the first month and "validate" on the second month
compute your error
fit on the first two months and "validate" third month
compute your error
continue doing this till you have fit on n-1 months and validated on nth month
compute your average CV error

Do this for each hyperparameter in your search space. Choose the one that gives the least CV error.

After you have chosen the hyperparameter, you can fit the model on all the data except the month that you want to forecast for. Use the fitted model to forecast for the required month.

HTH.

Machine Learning – Cross Validation and Hyperparameter Tuning Workflow

... I use train and test sets for building and testing my model (this includes all the preprocessing steps and nested cv) and use the valid set to test my final model.

Test set is typically used as final evaluation and validation set for tuning. So, I'll be using the general convention below.

Do we use Nested Cross validation to tune the hyperparameters during the cross validation process or do we first select the best performing algorithm via cross validation and then tune the hyperparameter for only that algorithm?

You shouldn't compare models without tuning them. One way to do is nested cross validation where we have two levels of validation sets, i.e. train_inner + validation_inner + validation_outer. Each algorithm's hyperparameters (HP) are tuned on validation_inner. Then, in the outer loop, each algorithm with its best HP set is trained on train_inner + validation_inner, which is train_outer, and tested on validation_outer. If this is CV, the sets of best HPs change in each outer loop evaluation, but in the end the two algorithms are compared. The winner algorithm will be tuned on train_outer and tuned validation_outer.

Finally, the best model with its best HP is trained on train_outer + validation_outer, which we can call train, and it's tested on a test set for last performance report.

One other way would be linearizing everything to a model list, e.g.

models = [RF(n_est=10), RF(n_est=100), SVM(C=1), SVM(C=10)]

and selecting the best among them using a single validation level, without nesting. This may be desirable when compute time is a concern since nested CV takes more time and resources.

About your Method 1, you shouldn't tune your HP and test your success on the same set:

clf = GridSearchCV(model, ...)
clf.fit(X_train, y_train)
score = cross_val_score(clf, X_train, y_train, ...)

This is an optimistic view on the success of the tuned model since it's being measured on the test it was tuned.

About your Method 2, the nested_score should be calculated solely on the test set, not X_iris, which the full dataset. Because, it also contains the training set.

None of the implementations you shared conforms with the methods I explained above.

A paper that explains both methods I proposed and favours the second one (calls it as flat cv): https://arxiv.org/pdf/1809.09446.pdf

Best Answer

Related Solutions

Solved – Validation set for hyperparameter tuning of ML time series model

Machine Learning – Cross Validation and Hyperparameter Tuning Workflow

Related Question