Quantiles – Calculating the Weighted Median

medianquantilesweighted mean

I'd like to compute a median of measurements taken from a population with 3 subgroups, A, B, and C. I'd like to the median to be "weighted", in the sense that each of the groups should have equal impact, regardless of the relative number of samples.

Apologies for not expressing this more formally, but hopefully an example will illuminate.
Let the measurements of A, B, and C be:
A={5,6,7}
B={8,9}
C={1,2,3,4}

The population median would simply combine all measurements:
MEDIAN({A,B,C})
=MEDIAN({1,2,3,4,[5],6,7,8,9})
= 5.

But what I want to do is assume that A, B, and C are equally represented in the population, however, not "fairly" sampled. So the median I want here would be obtained by repeating measurements of A 4 times, B 6 times, and C 3 times (scaling each set to 12 elements – the LCM of their cardinalities).

WEIGHTED_MEDIAN ({A,B,C})

= MEDIAN({A,A,A,A,B,B,B,B,B,B,C,C,C})

= MEDIAN({5,5,5,5,6,6,6,6,7,7,7,7},{8,8,8,8,8,8,9,9,9,9,9,9,{1,1,1,2,2,2,3,3,3,4,4,4}})

=MEDIAN({1,1,1,2,2,2,3,3,3,4,4,4,5,5,5,5,6,[6,6],6,7,7,7,7,8,8,8,8,8,8,9,9,9,9,9,9})
= (6+6)/2 = 6

Naturally, this is easy with very small sets. My question: is there a less expensive approach to compute this? If A, B, and C have even moderately large cardinalities that happen to be prime (e.g. 1009 ,1013, and 4919), the desired median would entail expanding A', B', and C' to each have cardinality of 5,027,793,523 – which is computationally absurd.

If there's not a direct simplification, is there an approximation that is reliably close? Would a weighted average of the median of each sub-group reliably give good approximations, or are there conditions under which it would skew heavily away from the "true" weighted median:
(MEDIAN(A)+MEDIAN(B)+MEDIAN(C))/3
= (6+8.5+2.5)/3
= 17/3
= 5.6667
~= 6

Two variations of this:
Variation 1: how to handle if I know A, B, and C represent 20%, 20%, and 60% of the population? For my example, this would be equal to the median of repeating measurements of A 4 times, B 6 times, and C 9 times (resulting in set of 60 values with median 4).
Variation 2: How to compute weighted percentile other than median, e.g. 25th percentile or 75th percentile of weighted results?

Best Answer

A weighted median, as defined in https://en.wikipedia.org/wiki/Weighted_median is easy to calculate. Let your observations be $Y_1, Y_2, \dots, Y_n$ with corresponding weights $w_1, w_2, \dots, w_n$. Then order the observations, let the order statistics be $Y_{(1)} \le Y_{(2)} \le \dots \le Y_{(n)}$ with corresponding weights $w_{(1)}, w_{(2)}, \dots, w_{(n)}$. Then sum the weights as such: $$ W(r) = \sum_{i=1}^r w_{(i)} $$ Now find the index $\hat{r}$ such that $$ W(\hat{r}-1)/W(n) \lt 0.5 \le W(\hat{r})/W(n) $$ and we can take $Y_{\hat{r}}$ as the weighted median, or we could maybe interpolate between $Y_{\hat{r}-1}$ and $Y_{\hat{r}}$.

As noted in comments, this is what you do when computing the median from a histogram.

Related Question