Neural Network – Correct Loss Function and Metric for Regression of Count Data

count-datakerasloss-functionsregression

I am using a convolutional neural network to predict the number of occurrences of a certain pattern in time series data. Since there might be potentially any count of such patterns in a time series, I am dealing with regression, rather than with classification.
Normally, for regression, mean squared error (MSE) is used as loss and MSE and mean absolute error (MAE) are used as evaluation metrics. However, if I use them in my network, they give me float numbers for both predictions and evaluation metric values. This is useful, but I would ideally like to have integer numbers for both predictions and evaluation metric values, since the occurrence of patterns is count data.

From my statistics classes I remember that the poisson function is often used for modelling count data. However, my teacher always called the poisson distribution "the distribution of rare events", meaning that it assumes that a low number of occurrences of the event is more likely than a large number of occurrences. This, however, is not necessarily true for my problem: The patterns I am looking for might occur never, but also 5, 10, 20 or 30 times within a time series.

For this problem, which loss and evaluation metric is suitable in order to get integer numbers for predictions on test data?

As an example, this is the part of my Keras model where the issue is rooted:

model.compile(optimizer=Adam(learning_rate = 0.001), loss = 'poisson', metrics = ['mae', 'mse']) 

Note: Using the poisson loss in combination with mae and mse as metrics yielded very bad results. I could not find any metric that worked well in combination with the poisson loss. But maybe you know one, or you can recommend me a better loss-metric-combination for the problem.

Best Answer

The Poisson distribution is one integer-valued distribution among many alternatives. You can experiment with alternative losses.

The model's predictions for the Poisson model is the conditional expectation, so there's no reason for it to be restricted to integers in general. To simplify, consider that the average of $(1,2,3,4)$ is not an integer, even though each of the elements is an integer. Naturally, you can do stuff like rounding to obtain integers -- whether or not it's the best choice depends on your goals and how you define "best."

The metric isn't the reason that you have poor fit, it's just a thing that measures how good the fit is. A better model will improve the fit (tautologically).

Related Question