[Math] How to find median from a histogram

medianstatistics

I am doing a course on machine learning and as part of it i am also learning statistics.

I came across one question in which i have to find the median basing on a histogram.

Median is the (n+1)/2th element.

But in the histogram the hint is confusing me. What does that mean
43 is the median of the frequencies, but it's not the median of the values.

For the median of the values, if you sum up all the frequencies below the median, and all the frequencies above the median, you should get the same number.

enter image description here

Please help.

Best Answer

Add up all the frequencies to find the total number of whatever it is ($n$). Find $\dfrac{n+1}{2}$, and that's the element you need to find the value of.

Now you just need to iterate over the histogram. Keep a running total of frequencies. When your total passes $\dfrac{n+1}{2}$, the last value you added the frequency for is the median.

In python, if you have the histogram as a dictionary (in your example, {5: 0, 10: 36, 15: 54, 20: 69, 25: 82, 30: 55, 35: 43, 40: 25, 45: 22, 50: 17, 55: 0}),

def median(histogram)
    total = 0
    median_index = (sum(histogram.values()) + 1) / 2
    for value in sorted(histogram.keys()):
        total += histogram[value]
        if total > median_index:
            return value
Related Question