SVM Models – How to Identify if a Learning Curve from an SVM Model Suffers from Bias or Variance

biasmachine learningsvmtrain

I created this learning curve and I want to know if my SVM model suffers from bias or variance? How can I conclude that from this graph?

learning cuve

Best Answer

Part 1: How to read learning curve

Firstly, we should focus on the right side of the plot, where there are sufficient data for evaluation.

  • If two curves are "close to each other" and both of them but have a low score. The model suffer from an under fitting problem (High Bias)

  • If training curve has a much better score but testing curve has a lower score, i.e., there are large gaps between two curves. Then the model suffer from an over fitting problem (High Variance)

Part 2: My assessment for the plot you provided

From the plot it is hard to say if the model is good or not. It is possible that you have a really "easy problem", a good model can achieve 90%. On the other hand, it is possible you have a really "hard problem" that the best thing we can do is achieving 70%. (Note that, you may not expect you will have a perfect model, say score is 1. How much you can achieve depends on how much noise in your data. Suppose your data has a lot of data points have EXACT feature but different labels, no matter what you do, you cannot achieve 1 on score.)

Another problem in your example is that 350 examples seems to be too small in a real world application.

Part 3: More suggestions

To get a better understanding, you can do following experiments to experience under fitting an over fitting and observe what will happen in learning curve.

  • Select a very complicated data say MNIST data, and fit with a simple model, say linear model with one feature.

  • Select a simple data, say iris data, fit with a complexity model, say, SVM.


Part 4: Other examples

In addition, I will give two examples related to under fitting and over fitting. Note this is not learning curve, but the performance respect to number of iterations in gradient boosting model, where more iterations will have more chance of over fitting. The x axis shows number of iterations, and y axis shows the performance, which is negative Area Under ROC (the lower the better.)

The left subplot does not suffer from over fitting (well also not under fitting since the performance is reasonably good) but right one suffers from over fitting when number of iterations is large.

enter image description here