Solved – Why does the training error usually underestimate the test error

machine learningtrain

I understand that most algorithms are optimized to minimize the training error but why is the test error usually larger then the training error? Is there a statistical reason why?

Best Answer

Training and testing data are not identical.

As you yourself point out, most training optimizes the model performance on the training set; clearly it would tend to be worse on a different set of data.

Consider a really simple case of two samples (training and testing samples) from one population; the sample mean of the training set is closest (in the specific mean square error sense) to the training set, while its mean square error from the test set includes an additional term that is related to the square of the difference in the two sample means.