Solved – Class Balancing in Deep Neural Network

deep learningimage processingmachine learning

I was trying to do class balancing on the image semantic segmentation problem for some classes in the images are in the minority. The weight for each class is calculated as mentioned in this paper: http://arxiv.org/pdf/1511.00561v2.pdf

we weight each pixel by αc = median_freq / freq(c) where freq(c) is the number of pixels of class c divided by the total number of pixels in images where c is present, and median_freq is the median of these frequencies.

Then I weighted the cross entropy loss as follows, the size of the label is (img_col, img_row, num_class) for the labels are one-hot-labels:

def weighted_cce(coding_dist, true_dist, weights):
    # calculate weighted cross entropy loss
    # true_dist: ground truth, coding_dist: predicted
    coding_dist = T.clip(coding_dist, 10e-8, 1.0-10e-8)
    return -T.sum(weights * true_dist * T.log(coding_dist), axis=coding_dist.ndim-1)

What's strange is that instead of producing a more balanced output, the result is even more biased than without class balancing, namely the network now can only recognize the most dominant classes in the images.

Could anyone share some thoughts on this? Thanks in advance!

Best Answer

Yes, they need to compute the weights ones but not assigned it to the whole loss. Instead each pixel in the loss (before summing it in both directions) should take a weight. So overall 'weights' is a tensor just like the others. Lets say there are only two classes, and frequency of $ C_1 $ is twofold of $ C_2 $. One of the pixel is corretly predicted as $ C_2 $ with confidence [0.3 0.7] . The loss is $ sum([1, 0].*log[0.3, 0.7]) $. When the weight is included the loss is $ sum([1, 0].*log[0.3, 0.7] * 2) $, because $ C_2 $ should take twice to make a balance. So for each pixel, weight is either 1 or 2 depends on which class it belongs to. This construct a weight matrix. However it can be convenient to think it as tensor because, the weight value correspond to the other class multiplied by 0 in 'true_dist'. In this case the loss for single pixel can be written as $ sum([1, 0].*log[0.3, 0.7].*[2, 1]) $. So it doesn't effect the result. In this way you can make a point-wise multiplication.

PS: It didn't fit to the commment section

EDIT: I can't edit your code because the weight calculation section is not included. If you calculated weights for N classes, then its a 1XN vector. You will construct a 3D array, $ W_{ijk} $, with these weights. The first and second dimension of this array corresponds to 'img_col' and 'img_row' respectively. The third dimension will be a function of 'true_dist', $ T_{ijk}$, at corresponding pixel. I guess you are confused in here, so I will try to be more open at this point. Lets say N is 4 and the weight vector you calculated is denoted as $ w = [w_1,w_2,w_3,w_4]$. The weight values are inversely correlated with frequency of each class'. If a pixel $ (a,b) $ belongs to class $ C_3 $ then T_{ab.} = [0, 0, 1, 0] and $ W_{ab.} = T_{ab.}.*([w_1,w_2,w_3,w_4]) = [0,0,w_3,0]$ where $ .* $ is point-wise multiplication. So only 3rd class' weight value will effect for that individual pixel (a,b). As you see $ W $ is a function of $ T $. What you need to do evaluate $ W $ before passing in to the loss. You can make a function which take $ T $ as input.

The 'weights' in the code is denoted as $ W $ (capital) is a 3D array. $ w $, vector, corresponds to reverse frequency values for each class.

EDIT2: Sorry for the mess I created in here. You don't need to make point-wise multiplication to create $ W_{ijk}$ because it is already done in loss function. So just replicate $ w $ to each pixel of $ W $.

$ \forall (a,b), W_{ab.} = [w_1,w_2,w_3,w_4]$