Cross-Validation – Understanding the RMSE of k-Fold Cross Validation

cross-validationerrorrms

I am testing a neural net to predict numeric values. For that i am using a Training,Validation and Test split. I made a manual 4-Fold CV, this means i am getting 4 RMSE error, each one is the error of the i-th Fold on the test data.

How do i get global RMSE of all 4 Folds. Would it be (rmse_1 + rmse_2 + rmse_3 + rmse_4)/(number of all predictions)

Best Answer

To be correct, you should calculate the overall RMSE as $\sqrt{\frac{RMSE_1^2 + \dots + RMSE_k^2}{k}}$.

Edit: I just got from your question that it may be necessary to explain my answer a bit. The $RMSE_j$ of the instance $j$ of the cross-validation is calculated as $\sqrt{\frac{\sum_i{(y_{ij} - \hat{y}_{ij})^2}}{N_j}}$ where $\hat{y}_{ij}$ is the estimation of $y_{ij}$ and $N_j$ is the number of observations of CV instance $j$. Now the overall RMSE is something like $\sqrt{\frac{\sum_j{\frac{\sum_i{(y_{ij} - \hat{y}_{ij})^2}}{N_j}}}{k}}$ and not what you propose $\frac{\sum_j{\sqrt{\frac{\sum_i{(y_{ij} - \hat{y}_{ij})^2}}{N_j}}}}{\sum_j{N_j}}$.