Solved – Completely different results after each cross validation

accuracyclassificationcross-validationMATLABrepeatability

I'm running some classification algorithms in MATLAB and validating them with a 10-fold cross validation. The problem is that every time I execute the cross validation, it gives a very different result. I see that it is normal the result to be a bit different each time, as the folds are created in a random way, but.. The result varies a lot.

How can I rely on my results seeing this behavior? Does this mean that the number of folds is too less or too much? What should I do?

All suggestions are welcome!

Thanks!

Best Answer

I think, you need to provide more information, as there are a lot of possible causes:

  • you have a small dataset, for example (extreme one) only 10 points
  • your classifier depends strong on randomness, for example the random forest
  • you are using a bad metric with an unbalanced dataset (say accuracy)

What you should do:

  • ensure, the dataset is big enough and you don't use (by a bug for example) only a small part of your data
  • try other classifiers, which work completely different (XGBoost, Random Forest, AdaBoost, Gradient Boost, SVM etc.) if that is easy to implement
  • use another metric (ROC AUC should generally do a good job)
  • use the same integer all the times to feed the pseudo-random number generator

Is anything special about your data/classifier compared to other classification problems you have done?

Related Question