Solved – wrong with the neural network

interpretationloss-functionsneural networksscoring-rulesvalidation

I am building a machine learning model to attempt to predict the winner of a sports match based on historical statistics of the two teams.

My model (a neural network) appears to get about 70% accuracy on test data which was better than I expected. However there are some weird things going on with the accuracy and loss over time charts.

Accuracy and Loss over time.

(Blue is training data, red is test data)

As you can see, the accuracy starts off flat, then at about 1000 training iterations it jumps straight to very close to the final values. It appears to get stuck predicting the same winners for each match in the first 1000 iterations despite the loss dropping significantly in that time.

The other thing I'm not sure about is how closely the loss functions for training and test data match. it looks like they are the same just offset.

What could be going wrong here? I'm not sure what direction to look.

More info:

My loss function is cross entropy, activation is ReLU, regularization is dropout. Weights are initialized with truncated normal distribution. The network itself is just a 5 layer feed forward ANN using Adam for training.

Best Answer

This is an interesting question. I guess that you have observed the effect of using a nonproper scoring rule. Accuracy is a noncontinuous (and nonproper) scoring rule, it will only change when the parameters under learning have changed sufficiently to actually change the decision. What you call Loss in the plot, cross-entropy, is akin to logistic regression, it is a continuous function of the parameters, and a proper scoring rule. So the difference you see in the plots is kind of expected, though the magnitude is interesting!

Some other posts with more information is Why isn't Logistic Regression called Logistic Classification? and Alternative notions to that of proper scoring rules, and using scoring rules to evaluate models

Here is another post which seems to run into the same problem: Probability Calibration messes Reliability

Related Question