Solved – Validation loss going down, but validation accuracy worsening

accuracyclassificationcross-validationlogisticregression

I'm training a simple logistic regression classifier on top of a rich feature set of 512 features for a binary classification problem. The training set is 200 observations and the validation set is 50 observations. These sizes cannot be changed. Early stopping is used based on the validation set to prevent overfitting (which is of course highly likely on this small training dataset). There's a separate test set for evaluation afterwards. The datasets are fairly unbalanced with ~15% of observations being one class.

I'm experiencing a strange phenomena, where the validation loss will continuously go down during the optimization process, but the validation accuracy will worsen at the same time (i.e. also go down). Any suggestions for why this might happen?

The loss function is crossentropy.

Best Answer

Cross entropy, or almost all of the losses we use for classification, are surrogates for the accuracy function (loss is positive if you're wrong, negative if you're right). We typically choose such functions, like hinge loss for example, either because they are convex, differentiable, or some combination of both. Why not just optimize accuracy? Because the accuracy function is intractable, so solving for some parameters that optimize it is not really something we can do efficiently or intelligently.

To put it simply, you're optimizing cross-entropy in hopes it gives you parameters that help your model be accurate. But that isn't a necessary nor a sufficient condition for accuracy. Common practice is to ignore your validation loss and focus on your target metric: accuracy.