Solved – Density function from percentiles (P10, P25, P75, P90, mean and median)

density functionprobabilityquantiles

I have percentile data (P10, P25, P75 and P90) for a variable.
I also have the mean and median for each group:

group    mean   median  P10     P25     P75     P90
1        30100  26200   19900   22500   32800   44200
2        38700  36600   28000   31500   44000   52100

How do I:

  1. Create a probability density function based on these variables.
  2. Use that function to give me the % in specific step intervals? (I.e. answering the question: How many out of 100 are in the 30000-31000 interval for group 2?)

Thanks.

Best Answer

The distributions are clearly positively skewed, so a normal distribution wouldn't be appropriate. Economists often seem to assume that income has a log-normal distribution, so that would probably be a good choice if it fits OK. To check that, you could log the data and then construct a normal probability plot for each group by plotting the logged percentiles (ignore the mean but include the median as the 50th percentile) against the percentiles of a standard normal distribution. If the points lie roughly on a straight line then the log-normal distribution is a reasonable fit. You could then estimate its parameters by fitting a straight line by least squares - that's not the optimal method, but it's simple and probably good enough.

Update: Just tried that myself: enter image description here

Log-normal seems an reasonable fit in group 2, but not so good in group 1. I don't know if it might still be good enough for your purposes. If not you might need to go to some three-parameter distribution, but that could get a fair bit more complicated.

Related Question