Solved – cross-validation: what is the standard deviation if the same value is obtained for each fold

cross-validationmachine learningmsestandard deviationstandard error

Here is a detailed imaginary example: I am using 5-fold cross-validation to estimate the generalization MSE of my predictive model.

When I hold-out fold number 1, which contains 10 observations, say I obtain:

   actual_value predictions squared.residual
1      43.73546    45.57342       3.37807764
2      51.83643    53.40071       2.44694877
3      41.64371    41.79284       0.02223975
4      65.95281    61.97410      15.83008068
5      53.29508    54.53473       1.53673583
6      41.79532    41.68306       0.01260174
7      54.87429    54.56270       0.09708896
8      57.38325    54.44174       8.65245030
9      55.75781    54.80151       0.91450990
10     46.94612    47.78200       0.69870059

Therefore the MSE for the first fold is mean(squared.residual). The standard error of this value is sd(squared.residuals)/sqrt(10).

So for the first fold I obtain:

       MSE   SE_MSE
1 3.358943 1.611509

Now imagine that I obtain roughly (or even exactly) the same MSE for each fold. For example:

       MSE   SE_MSE
1 3.358943 1.611509
2 3.472887 1.680483
3 3.331932 1.614309
4 3.839267 1.537181
5 3.351095 1.630388

The apparent standard error of the MSE is close to zero (or zero in the extreme case where each fold yeilds the same value). Yet, we know the SD of the MSE for each fold and it is not zero at all.

How accurate is my final estimate of the MSE (obtained by averaging the MSE from each fold)?

Best Answer

Dealing with regression can be confusing because there are 2 SD. The whole point of the cross validation is to give you an estimate of the future behavior of the regressor. In this case you have 5 estimations of the regressor on future data, one for each fold.

What do you want to know about the regressor on future data:

1) what is the expected error (MSE) on future data - that is the mean of the 5 CV MSEs

2) what is the expected SD of the errors on future data - that is the mean of each CV SD!

You may also want to evaluate how certain you are of those estimates:

3) what is the variance (or SD) of the estimate of future MSE - that is the variance (SD) of the 5 CV MSEs -- this is the one that is low in your example - so you know that your estimate of the future MSE is pretty tight. (Well maybe - there is a paper that shows that the SD of a CV measure - in this case the MSE - is not a good estimate of the true SD - but let us leave this for later)

4) what is the variance (or SD) of the estimate of future SD of the MSE - that is the variance (SD) of the 5 CV SD

So going back to your question

The apparent standard error of the MSE is close to zero (or zero in the extreme case where each fold yeilds the same value). (That is the SD for the estimate of the MSE and not the estimate of the SD for future data) Yet, we know the SD of the MSE for each fold and it is not zero at all. (That is the estimate of future SD)