Bias – Does Reducing Training Dataset Size Decrease Bias?

biasbias-variance tradeoffmachine learningvariance

I'm a newbie and learning ML. I've a doubt, normally we know we should increase the size of training dataset or should add more data to reduce variance (fairly understood why). Now variance has inverse relationship with bias, so it means when we're adding more data, we're reducing variance – or we're increasing bias. Then, why this is not possible to reduce bias by reducing the number of training samples. Could someone please explain me.

Best Answer

Now variance has an inverse relationship with bias

Not necessary. A picture is worth a thousand words, so let me use the image below. (Check also the Intuitive explanation of the bias-variance tradeoff? thread.)

Four darts boards: low bias and low variance - all the hits are in the bull's eye; low bias and high variance - the hits are scattered around the bull's eye; high bias and low variance - the points are concentrated in some random location; high bias and high variance - the points are scattered around some random location.

Imagine your model is an oracle that perfectly predicts the target, it will have no bias and no variance.

Then, why is not possible to reduce bias by reducing the number of training samples.

Imagine a model that always predicts the same constant (say, $42$), it will be biased regardless of how much data would you use because the result is independent of the data. The example is abstract, but not as abstract as you may think, for example, this would be the case for a Bayesian model with a very strong prior, or using an incorrect model for the job (e.g. image classification using a model that was designed for natural language processing), such models are doomed to make bad predictions regardless of the data.