Solved – Class Balancing in Deep Neural Network

deep learningimage processingmachine learning

I was trying to do class balancing on the image semantic segmentation problem for some classes in the images are in the minority. The weight for each class is calculated as mentioned in this paper: http://arxiv.org/pdf/1511.00561v2.pdf

we weight each pixel by αc = median_freq / freq(c) where freq(c) is the number of pixels of class c divided by the total number of pixels in images where c is present, and median_freq is the median of these frequencies.

Then I weighted the cross entropy loss as follows, the size of the label is (img_col, img_row, num_class) for the labels are one-hot-labels:

def weighted_cce(coding_dist, true_dist, weights):
    # calculate weighted cross entropy loss
    # true_dist: ground truth, coding_dist: predicted
    coding_dist = T.clip(coding_dist, 10e-8, 1.0-10e-8)
    return -T.sum(weights * true_dist * T.log(coding_dist), axis=coding_dist.ndim-1)

What's strange is that instead of producing a more balanced output, the result is even more biased than without class balancing, namely the network now can only recognize the most dominant classes in the images.

Could anyone share some thoughts on this? Thanks in advance!

Best Answer

Yes, they need to compute the weights ones but not assigned it to the whole loss. Instead each pixel in the loss (before summing it in both directions) should take a weight. So overall 'weights' is a tensor just like the others. Lets say there are only two classes, and frequency of $ C_1 $ is twofold of $ C_2 $. One of the pixel is corretly predicted as $ C_2 $ with confidence [0.3 0.7] . The loss is $ sum([1, 0].*log[0.3, 0.7]) $. When the weight is included the loss is $ sum([1, 0].*log[0.3, 0.7] * 2) $, because $ C_2 $ should take twice to make a balance. So for each pixel, weight is either 1 or 2 depends on which class it belongs to. This construct a weight matrix. However it can be convenient to think it as tensor because, the weight value correspond to the other class multiplied by 0 in 'true_dist'. In this case the loss for single pixel can be written as $ sum([1, 0].*log[0.3, 0.7].*[2, 1]) $. So it doesn't effect the result. In this way you can make a point-wise multiplication.

PS: It didn't fit to the commment section

EDIT: I can't edit your code because the weight calculation section is not included. If you calculated weights for N classes, then its a 1XN vector. You will construct a 3D array, $ W_{ijk} $, with these weights. The first and second dimension of this array corresponds to 'img_col' and 'img_row' respectively. The third dimension will be a function of 'true_dist', $ T_{ijk}$, at corresponding pixel. I guess you are confused in here, so I will try to be more open at this point. Lets say N is 4 and the weight vector you calculated is denoted as $ w = [w_1,w_2,w_3,w_4]$. The weight values are inversely correlated with frequency of each class'. If a pixel $ (a,b) $ belongs to class $ C_3 $ then T_{ab.} = [0, 0, 1, 0] and $ W_{ab.} = T_{ab.}.*([w_1,w_2,w_3,w_4]) = [0,0,w_3,0]$ where $ .* $ is point-wise multiplication. So only 3rd class' weight value will effect for that individual pixel (a,b). As you see $ W $ is a function of $ T $. What you need to do evaluate $ W $ before passing in to the loss. You can make a function which take $ T $ as input.

The 'weights' in the code is denoted as $ W $ (capital) is a 3D array. $ w $, vector, corresponds to reverse frequency values for each class.

EDIT2: Sorry for the mess I created in here. You don't need to make point-wise multiplication to create $ W_{ijk}$ because it is already done in loss function. So just replicate $ w $ to each pixel of $ W $.

$ \forall (a,b), W_{ab.} = [w_1,w_2,w_3,w_4]$

Related Solutions

Solved – Loss function for semantic segmentation

Cross entropy is definitely the way to go. I don't know Keras but TF has this: https://www.tensorflow.org/api_docs/python/tf/nn/sigmoid_cross_entropy_with_logits

Here is a paper directly implementing this: Fully Convolutional Networks for Semantic Segmentation by Shelhamer et al.

The U-Net paper is also a very successful implementation of the idea, using skip connections to avoid loss of spatial resolution. You can find many implementations of this in the net.

From my personal experience, you might want to start with a simple encoder-decoder network first, but do not use strides (or strides=1), otherwise you lose a lot of resolution because the upsampling is not perfect. Go with small kernel sizes. I don't know your specific application but even a 2-3 hidden layer network will give very good results. Use 32-64 channels at each layer. Start simple, 2 hidden layers, 32 channels each, 3x3 kernels, stride=1 and experiment with parameters in an isolated manner to see their effect. Keep the dimensions always equal to the input dimension for starters to avoid resolution loss. Afterwards you can switch on strides and upsampling and implement ideas like U-Net. U-Net works extremely well for medical image segmentation.

For class-inbalance see https://swarbrickjones.wordpress.com/2017/03/28/cross-entropy-and-training-test-class-imbalance/ Here the idea is to weight the different classes with $\alpha$ and $\beta$ parameters.

Solved – Understanding Median Frequency Balancing

My interpretations is as follows:

"number of pixels of class c": Represents the total number of pixels of class c across all images of the dataset.
"The total number of pixels in images where c is present": Represents the total number of pixels across all images (where there is at least one pixel of class c) of the dataset.
"median frequency is the median of these frequencies": Sort the frequencies calculated above and pick the median.

Possible technique for calculation of frequencies of each class:

classPixelCount = [array of class.size() zeros]
classTotalCount = [array of class.size() zeros]

for each image in dataset:
    perImageFrequencies = bincount(image)
    classPixelCount = element_wise_sum(classPixelCount, perImageFrequencies)
    nPixelsInImage = image.total_pixel_count()
    for each frequency in per_image_frequencies:
        if frequency > 0:
            classTotalCount = classTotalCount + nPixelsInImage

return elementwiseDivision(classPixelCount, classTotalCount)

If you assume that every image must have every class and every image is of the same size, this approximates to:

classPixelCount = [array of class.size() zeros]

for each image in dataset:
    perImageFrequencies = bincount(image)
    classPixelCount = element_wise_sum(classPixelCount, perImageFrequencies)

totalPixels = sumElementsOf(classPixelCount)
return elementwiseDivision(classPixelCount, totalPixels)

Finally, to calculate the class weights:

    sortedFrequencies = sort(frequences)
    medianFreq = median(frequencies)
    return elementwiseDivision(medianFreq, sortedFrequencies)

Best Answer

Related Solutions

Solved – Loss function for semantic segmentation

Solved – Understanding Median Frequency Balancing

Related Question