It'd actually be better to use the same folds while comparing different models, as you've done initially. If you input the pipeline object into the randomCV object, it should use the same folds. But, if you do the other way around, each run will change the folds as you said. Even in that case, you can fix the folds by fixing the cv
argument in the pipeline object.
Validation Set is only necessary when we have hyperparameters in our model, otherwise validation is useless.
You are right in that when no hyperparameters are tuned a single split into training and testing is all you usually do for an internal* generalization error estimate.
Validation is however, a somewhat ambiguous term here (see here for my take on the historical reasons). Do not confuse not having (or not seeing) the middle data set of the famous train/validation/test split with the need of verification and validation of the model in the engineering (or application field) sense of the word. That latter need is not touched at all by the way you organize your model training.
* this internal refers to the fact that training and test data are produced by splitting one larger data set/from the same lab/data source. This again is more the engineering terminology.
Why its bad to use Training and Test Set multiple times?
There is nothing inherently bad in evaluating them multiple times. The trouble arises from
- multiple use of such evaluations. In particular, any test data whose results are used to steer decisions like model selection becomes part of the training procedure of the selected model and is thus not an independent test result any more. Hence the need for another, (outer) independent test set.
- However often you evaluate a data set doesn't get around the fact that the data set contains only so many independent cases.
This again is not wrong in itself, as long as any further conclusions or actions take this into account. But not taking this into account can lead to serious overestimation of the generalization error estiate's quality.
What is going to tell us the Validation Set that the Test Set cannot tell?
- nothing as long as there is no model selection involved.
- as soon as there is model selection involved, the test set tells you whether this selection procedure did cause overfitting to the validation set.
Under the reasons for using K-fold Cross Validation instead of a simple Validation there is that if the Validation Set is not big enough we may risk to overfit the Validation Set. Shouldn't be the Training Set that we risk overfit?
no, we're one step further here in our considerations:
We are using the Validation set only for evaluation, not for training, so why the risk to overfit it instead of the Training Set?
When selecting hyperparameters based on the validation set (aka inner test set aka development set aka optimization set) error estimate, the validation set becomes part of the training of the final model.
The risk of overfitting during the hyperparameter estimation increases among other factors with the variance uncertainty of the error estimate used to guide the model selection. This is where k-fold is better than a single split since more cases tested means lower uncertainty due to the finite sample tested.
Another important factor is the number of hyperparater sets you select from (the size of your search space).
From a stats point of view, selecting the best hyperparameter set is a multiple comparison situation, and the more comparisons and the more variance on the performance estimates, the larger the risk to select a model that only accidentally seemed to be better. This is what overfitting to the validation set means.
Best Answer
Yes. As a rule, the test set should never be used to change your model (e.g., its hyperparameters).
However, cross-validation can sometimes be used for purposes other than hyperparameter tuning, e.g. determining to what extent the train/test split impacts the results.