Solved – How will one determine a classifier to be of high bias or high variance

bias-variance tradeoffclassificationmachine learning

The bias and variance of a classifier determines the degree to which it can underfit and overfit the data respectively. How could one determine a classifier to be characterized as high bias or high variance?

I am pretty clear of what is a bias-variance trade-off and its decomposition and how it could depend on the training data and the model. For instance, if the data does not contain sufficient information relating to the target function (to simply put it, lack of samples), then the classifier would experience high bias due to the possible incorrect assumptions it would make. On the contrary, if the classifier tightly fits the given training data (say, an ANN with a lot of nodes running multiple epochs or a decision tree with a high depth), then it would exhibit high variance because it cannot generalize well to predict unseen samples.

However, there are cases where I see lectures talking about selecting a high bias low variance classifier, or a low bias high variance classifier. E.g., naive Bayes is considered to be a high bias low variance classifier (I presume it is due to the conditional independence assumption). How to determine this? So how will one characterize SVM, ID3, Random Forests, and $k$NN? Are they high bias or high variance?

Best Answer

I presume you are interested in the intrinsic quality of an algorithm. This is a non trivial question and the topic of active research.

Bounds on the bias and variance of an algorithm can be proven via the notion of algorithmic stability - see:

The arizona paper shows the proof for K-NN and 1-NN algorithms which is nearly perfectly unbiased (page 4). You will have to read into the other papers for other kinds of algorithms. Note that not all algorithms have proofs yet and that there are many different forms of stability with their corresponding bounds.

A different (but related) approach is to look into VC theory https://en.wikipedia.org/wiki/Vapnik%E2%80%93Chervonenkis_theory

Best Answer

Related Solutions

Solved – Bias and Variance, Overfitting and Underfitting

Solved – Does more training data help lower the bias of a high bias model

Related Question