I am creating a RandomForestClassifier model that uses biomarker measurements and clinical measurements to predict a disease (binary). There are an equal amount of people who do and do not develop this disease. I have been using GridSearchCV to tune the hyperparameters with cv=5. Through this cross validation the best AUC score is 0.79. When I apply these changes for the testing set, the AUC score is 0.66.
What does this mean? Does it mean the model is overfitting? If so, how I can I fix that?
Thanks!
Best Answer
You are correct that this suggests over-fitting. (By over-fitting, in this case we mean that the final learner $f_{\text{final}}$ produced by our model selection procedure is unable to generalise in out-of-sample estimates.)
While there is not direct commentary on the sample sizes involved there are some obvious things to consider:
Obviously this is not an exhaustive list but we can use it to do a quick investigation regarding the causes of the non-generalisable performance behaviour observed.