Logistic Regression – Why Logistic Regression Performs Better on Validation Data

logisticvalidation

Recently I've been building a model using logistic regression. To my suprisise LIFT chart looks better on the validation data than on the training data, the same is with ROC. All variables in the model are statistically significant.

The question is: Is this really a serious problem? If so, what are the methods to detect what cause it? Does this problem have a name?

I've tried to solve it by building all possible models without a single variable, but the problem still occured in each model.

Additional Info
The size of the sample is about 20000, the proportions are: Training 70%, Validation 30%,
I don't have results in front of my eyes, but as I recall the lift in second percentil in training data was about 8 and in validation was about 9, the a priori is 4.2%.

Best Answer

No, this isn't necessarily a problem, especially if the sample size is small. It could easily be that purely by chance more of the "easy" patterns are in the validation set and more if the "difficult" ones are in the training set. If you were to repeatedly re-sample the data to form randomly partitioned training and validation sets, you would expect the average error on the training set to be lower than on the validation set, but that does not mean that it will be lower on every run of the experiment.

If your sample size is small, it suggests that this variability means that the validation set performance estimate has a high variability and isn't a reliable indicator of performance, so you should probably use some sort of (repeated) cross-validation or perhaps bootstrapping instead.

I have seen this sort of thing before as I have been working on the problems caused in model selection caused by the variance of the model selection criterion. It doesn't necessarily indicate a problem with the model, but it does suggest that the sample of data is too small.

If the relative class frequencies are very disparate, then it may be that the validation set happens to have fewer minority class examples than the training set, which might also affect the performance estimate, use stratified bootstrap or cross-validation, which maintains the same proportion of positive and negative patterns in the training set and validation set.

Related Question