I am dealing with a regression problem (my targets could potentially take values between -inf to +inf).

To optimise my model, I have two objectives:
1) Predictions should be close to the targets.
2) The sign of my prediction should match the sign of the target.

For 1) I can simply use the square (L2) loss on my loss function. However, I am unsure which extra term should I add to my loss function to account for 2).

To illustrate this: If my target is y = 1.0, my loss should be larger for a prediction y_hat = -1.0, than for a prediction y_hat = 3.0.
I am solving the optimisation problem using Gradient Descent. In some sense, my problem is a classification-regression hybrid; I had in mind to use something similar to a hinge loss: max(0, -y * y_hat). However, since the target values are not bounded (they could be anywhere between -inf to inf), predicting larger absolute values is penalised more strongly than small absolute values, yielding very poor results.

have you thought about just adding cross entropy maybe? For example:

mean_square_loss = tf.losses.mean_squared_error(labels=labels,
cross_entropy = tf.losses.sparse_softmax_cross_entropy(labels=class_labels, 
loss = tf.add(mean_square_loss, cross_entropy)

Would have to add logits of shape [?,2] and class_labels representing the sign of the real label as element of [0,1).

