[Math] Derivation of formula for finding median for grouped data

medianstatistics

I know the formula of formula for finding median for grouped data that is $$\mathrm{Median} = L_m + \left [ \frac { \frac{n}{2} – F_{m-1} }{f_m} \right ] \times c$$
and I know what all the letters stand for. But can anyone provide a derivation of this. Because I am very curious on how this comes.

Best Answer

This formula is the result of a linear interpolation, which identifies the median under the assumption that data are uniformly distributed within the median class.

To derive the formula, we can note that since $N/2$ is the number of observations below the median, then $N/2 - F_{m-1}$ is the number of observations that are within the median class and that are below the median ($F_{m-1}$ is the cumulative frequency of the interval below the median class, i.e. of all classes below the median class).

As a result, the fraction $\displaystyle\frac {N/2 - F_{m-1}}{f_m}$ (where $f_m$ is the frequency of the median class) represents the proportion of data values in the median class that are below the median.

Now if we assume that data are uniformly distributed (i.e., equally spaced) within the median class, multiplying the last fraction by $c$ (total width of the median class) we obtain the fraction of median class width corresponding to the position of the median. Adding the result to $L_m$ (lower limit of the median class), we get the final formula $\displaystyle L_m + \left [ \frac { \frac{N}{2} - F_{m-1} }{f_m} \right ] \cdot c$, which identifies the median.