Solved – the difference between (bias variance) and (underfitting overfitting)

biasoverfittingvariance

I think bias and variance are metrics to choose the moderate model complexity(to choose the right model from candidate models), and underfitting and overfitting are metrics to know to what extent has the model leant the data. Am I right?

And what is the relationship between them?

Best Answer

All are metrics to find the best model: You would like to have an unbiased minimum variance estimator related to the validation interval - and thus neither over nor underfitted. But how to balance these metrics is up to your application / context.

Bias: The error you have for certain even if you can use an infinite number of cases / records. Usually you try to get unbiased models.

Variance / Significance: Relates to the probability that the true relationships between your variables are trivial (e.g. zero) and your model sees an accidental and purely randomly generated data pattern. Variance and bias are usually independent.

Overfitting: Is related to the variance, but it's not the same. If you have a large data matrix you may fit a model with a large number of covariats and many parameters may have a small variance. Nevertheless if you split off 20% of the training material and make a prediction with your model on that validation data, the model may predict worse. That's overfitting: Your model fitted relationships, which aren't randomly within your full data set, but aren't systematic and stable for extrapolations outside the training data set.

Underfitting goes usually with a biased distribution of your residuals: Your model is insufficiently specified. (Relationship between target and covariat is quadratic, but you fit linear). Underfitting can result in a bias, but doesn't have to do. In any case it results in a non-minimal variance. Hope that helps a little bit.