Solved – Binary cross entropy vs mse loss function when asymmetric payoffs

cross entropykerasloss-functionsmseneural networks

I'm building a binary classifyer that has an unequal payoff given the following cass:

  • $Y_{pred}=Y_{actual}= True$: payoff is +x*100
  • $Y_{pred}$$\ne$$Y_{actual}= True$: payoff is -x*100
  • $Y_{pred}$$\ne$$Y_{actual}= False$: payoff is -1
  • $Y_{pred}=Y_{actual}= False$: payoff is +1

In other words, there are two possible courses of action: True for action 1 and False for action 2.

I'm looking at two approaches to implement it:

  1. I could mse as loss function with two output neurons, assign the payoffs directly to $Y_{actual}$, with the neural network predicting the payoffs for each of the actions.

    • I assume I would then just look for which of the two neurons predicts a higher payoff and use liner function in the last activation layer. Correct?
  2. I could create a custom loss function that will calculate the respective payoffs as described above and then use binary cross entropy on that?

Which of those two approaches would be preferred? Should they lead to the same result, giving a recommendation for each sample which action is preferable?

Best Answer

You cannot use your "payoff" function as a loss for your network because it is not differentiable. Instead, you can use a differentiable function which has outputs close to your evaluation metric (payoff).

As you are predicting a binary variable, the way to go is the binomial cross-entropy. The loss function would look like:

$$\mathcal{L}(\mathbf y, \mathbf c, \mathbf t)=-\frac {1}{N}\sum_n 100c_n t_n\log y_n + (1-t_n)\log (1-y_n),$$

where $\mathbf y$ are the predictions, $\mathbf t$ are the targets and $\mathbf c$ are the individual costs for each sample. You can see it is giving the false negative predictions $100c_n$-times higher penalty than false positives.

It is possible that you will have to implement this loss yourself, but it should be quite easy.

Using MSE for classification does not make much sense.