Solved – K-fold cross validation results interpretation

cross-validationrms

My linear model has a 0,08642 RMSE and after I perform 10-fold cross validation I get a 0,091276 RMSE. I have read on similar questions like mine, that RMSE of fit and RMSE of prediction should be very close numbers in order to be able to say that my model has a good prediction capability. Correct me, if I am wrong. I can't find official literature to support this. Everything I read about cross validation does not write about results interpretation.

Best Answer

K-fold CV is one way of estimating the extra-sample error of a given model, providing us with an idea of the prediciton quality on data previously unknown to the model.

My linear model has a 0,08642 RMSE and after I perform 10-fold cross validation I get a 0,091276 RMSE. I have read on similar questions like mine, that RMSE of fit and RMSE of prediction should be very close numbers in order to be able to say that my model has a good prediction capability.

I suppose the 0.091 is the average RMSE resulting from the CV?
That this average is close to your previous result alone does not yet say much about the model quality. Have a look at the distribution of the errors resulting from the CV - what does it look like?
If it shows a standard deviation it could mean that your model is overfitting the training data. If not then your model has probably done a good job at generalizing.

So now you have an idea of how well your model generalized to new, unseen data. However even if the dispersion of your CV errors is low, the error value itself can be a bit abstract without considering the characteristics of the data. Plot your data and your linear model to see how it fits.