Solved – Does more training data help lower the bias of a high bias model

bias-variance tradeoffmachine learning

It is clear that more training data will help lower the variance of a high variance model since there will be less overfitting if the learning algorithm is exposed to more data samples.

However, what impact does training data size have on a high bias model? Generally, will more training data lower the bias, will it have no effect, or will it cause a further increase in the bias?

This question is more specific than the following question which is similar:
What impact does increasing the training data have on the overall system accuracy?

One of the answers actually says that "high bias models will not benefit from more training examples". But there does not seem to be any consensus.

Best Answer

However, what impact does training data size have on a high bias model? Generally, will more training data lower the bias, will it have no effect, or will it cause a further increase in the bias?

You mean a model with prediction errors due to high bias?

Bias, is defined as $\operatorname{Bias}[\hat{f}(x)]=\mathrm{E}[\hat{f}(x)]-f(x)$ and thus would not be affected by increasing the training set size. If your model predicts vastly different values when the training set changes, i.e., if the error is largely defined by the variance of the predictions, than you can improve the overall loss by more training data, because the model will learn to generalize better, and hence the variance term will go down. To decrease the bias term, you probably need to choose a different model.

Related Question