Solved – Completely different results after each cross validation

accuracyclassificationcross-validationMATLABrepeatability

I'm running some classification algorithms in MATLAB and validating them with a 10-fold cross validation. The problem is that every time I execute the cross validation, it gives a very different result. I see that it is normal the result to be a bit different each time, as the folds are created in a random way, but.. The result varies a lot.

How can I rely on my results seeing this behavior? Does this mean that the number of folds is too less or too much? What should I do?

All suggestions are welcome!

Thanks!

Best Answer

I think, you need to provide more information, as there are a lot of possible causes:

you have a small dataset, for example (extreme one) only 10 points
your classifier depends strong on randomness, for example the random forest
you are using a bad metric with an unbalanced dataset (say accuracy)

What you should do:

ensure, the dataset is big enough and you don't use (by a bug for example) only a small part of your data
try other classifiers, which work completely different (XGBoost, Random Forest, AdaBoost, Gradient Boost, SVM etc.) if that is easy to implement
use another metric (ROC AUC should generally do a good job)
use the same integer all the times to feed the pseudo-random number generator

Is anything special about your data/classifier compared to other classification problems you have done?

Best Answer

Related Solutions

Solved – the difference between different kind of cross validation methods

Cross-Validation Techniques – Understanding Nested Cross Validation for Model Tuning

Update: longer explanation

Related Question