Say I have height measurements of Earth's population expressed with a precision of integer centimetres.
Height-in-cm Number-of-people
136 154934
137 158059
138 170372
139 220385
...
In terms for this example that uses frequency weights, is there a numerical method to calculate q-tiles from a data set weighted by the cardinality of its entries?
More generally, how do I find the percentile p (or quantile q) from a weighted dataset that uses weights of any kind, e.g. frequency, sampling, IPTW, IPSW, etc.?
Is there a formula, such as something that I could implement in R, other than expanding this into a pseudo-dataset with ten billion values?
Best Answer
To solve for quantile $q$ in a weighted set of ordered observations $x_1, x_2, \ldots$:
Let $W$ be the sum of the weights.
Let $w_1, w_2, \ldots$ equal the observation weights ordered by the ranks of the observations.
Find the largest $k$ such that $w_1+w_2+\ldots+w_k \leq Wq$.
Then $x_k$ is your estimate for the $q$th quantile.
Notice $x_k$ estimates a range of quantiles, just like you would see if you created an expanded dataset replicating observations repeatedly based on their weights.