I am not a stats person.
The world bank has data giving PPP (personal purchasing parity, or something like that) for quintiles (actually first 10%, second 10%, 2nd, 3rd, 4th 20%, 9th 10% and 10th 10%) of a country's population.
I've been reading a good deal about the Gini index and also modeling income distribution curves for populations, and I would like to turn this quintiles data into a lognormal distribution representing the same properties (e.g. that if China has 1,300 USD PPP for the 1st 10%, that after modeling it, the weighted average PPP (integration under the curve?) for the poorest 10% of the lognormal distribution would come out to 1,300).
Any thoughts on how to do this, tactically? I am cannot think my way through this – but I am a reasonable programmer of simple scripts and would use python's scipy and numpy to fit curves. Given some help on how to proceed.
@mpiktas, you are right, they do not give maximum income. The quintiles/decile information must be averages. I do not think you can straightforwardly fit this data as if it is raw data, to a distribution, if that were the case I would not have asked the question!
@Michael, given that the quintile/decile data points are average values for the entire quintile/decile bracket, is it possible still to come up with a best fit? Is this what you meant by least squares fitting?
Best Answer
Here is the example of the quick and dirty R code to illustrate what Michael suggested:
Define quantiles available:
Create artificial data and add some noise
Create function to minimise
Run the optimiser with the initial guess of parameters of log-normal distribution:
The parameters fitted:
Display the fit visually:
Note, I intentionally plotted only 95%-quantile, since the log-normal distribution is unbounded, i.e. the 100%-quantile is infinity.
Usual caveats apply, real life example might look much uglier than this one, i.e. fit might be much worse. Also try Singh-Maddala distribution instead of log-normal, it works better for income distributions.