I am learning $k$-fold cross validation. Since each fold will be used to train the model (in $k$ iterations), won't that cause overfitting?
Cross-validation – Can K-fold Cross Validation Cause Overfitting?
cross-validationoverfitting
Related Question
- Solved – Nested cross validation vs repeated k-fold
- Solved – 10-fold cross validation, why having a validation set
- Solved – Overfitting in Cross Validation for Hyperparameter Selection
- Solved – Neural Networks – Epochs with 10-fold Cross Validation – doing something wrong
- Cross-Validation Errors – Identifying False Cross Validation Points in Models
Best Answer
K-fold cross validation is a standard technique to detect overfitting. It cannot "cause" overfitting in the sense of causality.
However, there is no guarantee that k-fold cross-validation removes overfitting. People are using it as a magic cure for overfitting, but it isn't. It may not be enough.
The proper way to apply cross-validation is as a method to detect overfitting. If you do CV, and if there is a big difference between the test and the training error then you know you are overfitting and need to get more diverse data or choose simpler models and stronger regularization. The contrary does not hold: no big difference between test and train error does not mean you haven't been overfitting.
It's not a magic cure, but the best method to detect overfitting we have (when used right).
Some examples when cross-validation can fail:
There are other cases where it cannot detect information leakage and overtitting even when used perfectly right. For example when analyzing time series, people like to standardize the data, split it into past and future data, then train a model to predict the future development of these stocks. The subtle information leakage was in the preprocessing: standardization prior to temporal splitting leaks information about the average of the remainder. Similar leaks can occur in other preprocessing. In outlier detection, if you scale the data to 0:1, a model can learn that values close to 0 and 1 are the most extreme values you can observe etc.
Back to your question:
No. Each fold is used to train a new model from scratch, predict the accuracy, and then the model is discarded. You don't use any of the models trained during CV.
You use validation (such as CV) for two purposes:
CV is not a way of "training" a model by feeding 10 batches of data.