Solved – How to find the percentile p (or quantile q) from a weighted dataset

continuous dataquantilesrweighted-data

Say I have height measurements of Earth's population expressed with a precision of integer centimetres.

Height-in-cm Number-of-people
136 154934
137 158059
138 170372
139 220385
...

In terms for this example that uses frequency weights, is there a numerical method to calculate q-tiles from a data set weighted by the cardinality of its entries?

More generally, how do I find the percentile p (or quantile q) from a weighted dataset that uses weights of any kind, e.g. frequency, sampling, IPTW, IPSW, etc.?

Is there a formula, such as something that I could implement in R, other than expanding this into a pseudo-dataset with ten billion values?

Best Answer

To solve for quantile $q$ in a weighted set of ordered observations $x_1, x_2, \ldots$:

Let $W$ be the sum of the weights.

Let $w_1, w_2, \ldots$ equal the observation weights ordered by the ranks of the observations.

Find the largest $k$ such that $w_1+w_2+\ldots+w_k \leq Wq$.

Then $x_k$ is your estimate for the $q$th quantile.

Notice $x_k$ estimates a range of quantiles, just like you would see if you created an expanded dataset replicating observations repeatedly based on their weights.

Related Question