Solved – Understanding Median Frequency Balancing

computer visiondeep learningmachine learningmathematical-statistics

This question is with reference to semantic segmentation.

According to the paper Predicting Depth, Surface Normals and Semantic Labels with a Common Multi-Scale Convolutional Architecture:

we weight each pixel by αc = median freq/freq(c) where freq(c) is the number of pixels of class c divided by the total number of pixels in images where c is present, and median freq is the median of these frequencies

However, I have some difficulty understanding what the author meant by:

  1. "number of pixels of class c". Do they mean the number of pixels of class C in one image or in all images?

  2. "The total number of pixels in images where c is present" – Do they mean the total number of pixels for each image is divided by the number of pixels of class c in that same image?

  3. "median frequency is the median of these frequencies"

After reading the above, my impression of this concept takes the form of this implementation:

  1. For each image, calculate the number of pixels C and divide it by the total number of pixels in the image. This will give you a frequency f_i

  2. For each image, compute f_i and then sort it by ascending order, before getting the median frequency. This will give you median_freq

  3. To compute freq(c), calculate the total number of c pixels in all images, and divide it by the total number of pixels in all images.

  4. Finally, compute each pixel's weight according the formula.

Meaning to say the implementation calculates the median frequency of c, aka class c's presence in each image before dividing it by the average presence of class c in all images.

However, I don't think this implementation causes dominant labels to be weighted less, because if dominant labels are frequently present by the same amount and the mean is not too different from the median, then the weight will be roughly equal to 1. So how does this help in class balancing? Could someone clarify whether my implementation is correct or clarify this concept?

Thank you.

Best Answer

My interpretations is as follows:

  1. "number of pixels of class c": Represents the total number of pixels of class c across all images of the dataset.
  2. "The total number of pixels in images where c is present": Represents the total number of pixels across all images (where there is at least one pixel of class c) of the dataset.
  3. "median frequency is the median of these frequencies": Sort the frequencies calculated above and pick the median.

Possible technique for calculation of frequencies of each class:

classPixelCount = [array of class.size() zeros]
classTotalCount = [array of class.size() zeros]

for each image in dataset:
    perImageFrequencies = bincount(image)
    classPixelCount = element_wise_sum(classPixelCount, perImageFrequencies)
    nPixelsInImage = image.total_pixel_count()
    for each frequency in per_image_frequencies:
        if frequency > 0:
            classTotalCount = classTotalCount + nPixelsInImage

return elementwiseDivision(classPixelCount, classTotalCount)

If you assume that every image must have every class and every image is of the same size, this approximates to:

classPixelCount = [array of class.size() zeros]

for each image in dataset:
    perImageFrequencies = bincount(image)
    classPixelCount = element_wise_sum(classPixelCount, perImageFrequencies)

totalPixels = sumElementsOf(classPixelCount)
return elementwiseDivision(classPixelCount, totalPixels)

Finally, to calculate the class weights:

    sortedFrequencies = sort(frequences)
    medianFreq = median(frequencies)
    return elementwiseDivision(medianFreq, sortedFrequencies)
Related Question