Bias – Does Reducing Training Dataset Size Decrease Bias?

biasbias-variance tradeoffmachine learningvariance

I'm a newbie and learning ML. I've a doubt, normally we know we should increase the size of training dataset or should add more data to reduce variance (fairly understood why). Now variance has inverse relationship with bias, so it means when we're adding more data, we're reducing variance – or we're increasing bias. Then, why this is not possible to reduce bias by reducing the number of training samples. Could someone please explain me.

Best Answer

Now variance has an inverse relationship with bias

Not necessary. A picture is worth a thousand words, so let me use the image below. (Check also the Intuitive explanation of the bias-variance tradeoff? thread.)

Imagine your model is an oracle that perfectly predicts the target, it will have no bias and no variance.

Then, why is not possible to reduce bias by reducing the number of training samples.

Imagine a model that always predicts the same constant (say, $42$), it will be biased regardless of how much data would you use because the result is independent of the data. The example is abstract, but not as abstract as you may think, for example, this would be the case for a Bayesian model with a very strong prior, or using an incorrect model for the job (e.g. image classification using a model that was designed for natural language processing), such models are doomed to make bad predictions regardless of the data.

Best Answer

Related Solutions

Training Data – Impact of Increasing Training Data on System Accuracy in Machine Learning

Solved – Does the dataset size influence a machine learning algorithm

Related Question