Solved – How to reduce overfitting in linear regression

algorithmsbayesianoverfittingregressionregularization

I am working with linear regression methods. The weakness of the method is the possibility of overfitting. So to reduce it, some papers use regularization. Are there other methods to reduce overfitting? Can we use a prior term to reduce overfitting?

Given $D=\{(x_1,y_1);(x_2,y_2)…(x_n,y_n)\}$, the linear regression of the data $D$ is:

$$H=wX+b$$

To reduce overfitting we add some regularization term. So the loss function is:

$$J=\sum(h(x_i)-y_i)^2+\lambda_1\sum(w_i^2)$$

But finding $\lambda_1$ is so hard. Can we ignore it by using other terms to get more effective results? Thanks.

Best Answer

You can estimate an optimal lambda that minimizes testing error during cross-validation. Testing error (i.e. Mean Squared Prediction error on a hold-out testing set) should decrease as lambda increases from zero as the training data is less and less overfit, but beyond a certain point it will increase back up again as the model is inadequately capturing the data. Optimal lambda can be conservatively chosen as the one which produces a testing error that is one standard error away from the minimum testing error (on the side of the higher lambda value).

Related Question