Solved – Best loss function for very sparse real-valued data

loss-functionsmathematical-statisticspredictive-models

Suppose the target output of my data prediction model is an $M\times N$ matrix where $95\%$ of the values are $0.0$ and the other values are anywhere between $0.0$ and $1.0$, what would be a good loss function to use for this kind of data?

As long as my model outputs a lot of $0$'s the MSE would be really small even at the start (about $10^{-3}$), and it has a hard time learning the values properly that are bigger than $0$

Any ideas? Thanks!

Best Answer

Can you do something with asymmetric loss, e.g. the cost of predicting zero when it should be non-zero is different from the cost of predicting non-zero when it should be zero.

Related Solutions

Solved – Cross entropy-equivalent loss suitable for real-valued labels

Cross entropy is defined on probability distributions, not single values. The reason it works for classification is that classifier output is (often) a probability distribution over class labels. For example, the outputs of logistic/softmax functions are interpreted as probabilities. The observed class label is also treated as a probability distribution: the empirical distribution (where the probability is 1 for the observed class and 0 for the others).

The concept of cross entropy applies equally well to continuous distributions. But, it can't be used for regression models that output a point estimate (e.g. the conditional mean) but not a full probability distribution. If you had a model that gave the full conditional distribution (probability of output given input), you could use cross entropy as a loss function.

For continuous distributions $p$ and $q$, the cross entropy is defined as: $$H(p, q) = -\int_{Y} p(y) \log q(y) dy$$

Just considering a single observed input/output pair $(x, y)$, $p$ would be the empirical conditional distribution (a delta function over the observed output value), and $q$ would be the modeled conditional distribution (probability of output given input). In this case, the cross entropy reduces to $-\log q(y \mid x)$. Summing over data points, this is just the negative log likelihood!

Solved – How to penalize a regression loss function to account for correctness on the sign of the prediction

have you thought about just adding cross entropy maybe? For example:

mean_square_loss = tf.losses.mean_squared_error(labels=labels,
                                                predictions=predictions)
cross_entropy = tf.losses.sparse_softmax_cross_entropy(labels=class_labels, 
                                                       logits=logits)
loss = tf.add(mean_square_loss, cross_entropy)

Would have to add logits of shape [?,2] and class_labels representing the sign of the real label as element of [0,1).

Best Answer

Related Solutions

Solved – Cross entropy-equivalent loss suitable for real-valued labels

Solved – How to penalize a regression loss function to account for correctness on the sign of the prediction

Related Question