Solved – Interpretation of low bias and variance for train/test errors

accuracybiasconfusion matrixerrorvariance

Based on the extensive discussion in this post, I understand that the goal is to achieve low bias and low variance.

Now, in terms of train and test errors, does it imply that achieving low bias and low variance is equivalent to having both high training and testing accuracy? Is it possible to acquire that from confusion matrix?

Best Answer

Overfitting/High Variance:

Your data fits very well on the training set, but poorly on the cross-validaton set. If you have no cross-validation set than it means that it fits poorly on the test set.

Underfitting/ High bias:

Your data fits badly on the training set and also badly on the test/CV set.

=> In both cases the model fits badly on the test. However we want our model to fit well on the test set. Testing accuracy is more important than training accuracy, because you want to now how good your model is on data is has not yet seen.

Your interpretation is correct: If you have a low bias and a low variance than the model has a good training and a good testing acccuracy.

You can also deduct this from the confusion matrix:

Few missclassifications in the training set and many in the test set:

You have high variance. You are overfitting your data.

Many missclassifications in the training set and many in the test set:

You might have high bias. You are probably underfitting your data.

Many missclassifications in the training set and few in the test set:

This should usually not appear. Maybe you made a mistake. This usually occurs due to non-random sampling of your data. Shuffle your data and fit the model again.

Few missclassification in the training set and few in the test set:

You have low bias and low variance. You reached your goal! Congratulations.

Best Answer

Related Solutions

Machine Learning – Understanding the Different Curves of Test Error and Variance in Bias-Variance Trade-Off

Related Question