Solved – a good loss function for object localisation and classification using a cnn

conv-neural-networkloss-functionsmachine learningneural networksobject detection

Context: Using a CNN to localise a object in an image. There are two kinds of objects present represented by classes C1 and C2. The output of the CNN is 6 nodes i.e. C1, C2, x, y, w, h. Where [C1,C2] = [0,1] if the class is C2 and it is [1,0] if the class is C1. x, y represent the centre of the bounding box surrounding the object and w,h represent the width and height of the bounding box.

Problem: Now I have been trying to compute softmax cross entropy loss classification ( i.e. on C1, C2 nodes ) and using L2 loss on the x,y,w and h nodes. The issue that I am facing is that one loss dominates the other loss and giving them weights to balance out each others' effect is not working very effectively. Can anyone suggest a good loss function that takes both classification and localisation into account.

Note:
1. There is an object present at all times in the image.
2. I have tried the yolo loss ( and its not good enough ) and am looking at different loss functions which people might have found useful for this kind of application.

Best Answer

Well, Yolo is a rather successful approach to this problem, so I would suggest looking into why it doesn't work a bit more. Perhaps there is a bug in implementation or something of the sort.

Anyway, one thing you could also consider in order to help balance loss terms is to take their max. For instance, if $L_c(D)$ is the classification loss and $L_\ell(D)$ is the localization loss on some set of data $D$, consider using: $$ L(D) = \max\{ \alpha L_c(D), \beta L_\ell(D) \} $$ for some $\alpha,\beta\in\mathbb{R}$.

This means that if the network focuses too much on minimizing one loss, the other loss will grow bigger and become the focus of the algorithm instead. This guarantees it cannot simply ignore one of the loss terms.

This sort of approach is reasonably common (e.g. it used in FoldingNet's autoencoder to balance the halves of the Chamfer distance).