Solved – How to penalize a regression loss function to account for correctness on the sign of the prediction

gradient descentloss-functionsmachine learningoptimizationregression

I am dealing with a regression problem (my targets could potentially take values between -inf to +inf).

To optimise my model, I have two objectives:
1) Predictions should be close to the targets.
2) The sign of my prediction should match the sign of the target.

For 1) I can simply use the square (L2) loss on my loss function. However, I am unsure which extra term should I add to my loss function to account for 2).

To illustrate this: If my target is y = 1.0, my loss should be larger for a prediction y_hat = -1.0, than for a prediction y_hat = 3.0.
I am solving the optimisation problem using Gradient Descent. In some sense, my problem is a classification-regression hybrid; I had in mind to use something similar to a hinge loss: max(0, -y * y_hat). However, since the target values are not bounded (they could be anywhere between -inf to inf), predicting larger absolute values is penalised more strongly than small absolute values, yielding very poor results.

Best Answer

have you thought about just adding cross entropy maybe? For example:

mean_square_loss = tf.losses.mean_squared_error(labels=labels,
                                                predictions=predictions)
cross_entropy = tf.losses.sparse_softmax_cross_entropy(labels=class_labels, 
                                                       logits=logits)
loss = tf.add(mean_square_loss, cross_entropy)

Would have to add logits of shape [?,2] and class_labels representing the sign of the real label as element of [0,1).

Related Question