Solved – Lognormal distribution from world bank quintiles PPP data

I am not a stats person.

The world bank has data giving PPP (personal purchasing parity, or something like that) for quintiles (actually first 10%, second 10%, 2nd, 3rd, 4th 20%, 9th 10% and 10th 10%) of a country's population.

I've been reading a good deal about the Gini index and also modeling income distribution curves for populations, and I would like to turn this quintiles data into a lognormal distribution representing the same properties (e.g. that if China has 1,300 USD PPP for the 1st 10%, that after modeling it, the weighted average PPP (integration under the curve?) for the poorest 10% of the lognormal distribution would come out to 1,300).

Any thoughts on how to do this, tactically? I am cannot think my way through this – but I am a reasonable programmer of simple scripts and would use python's scipy and numpy to fit curves. Given some help on how to proceed.

@mpiktas, you are right, they do not give maximum income. The quintiles/decile information must be averages. I do not think you can straightforwardly fit this data as if it is raw data, to a distribution, if that were the case I would not have asked the question!

@Michael, given that the quintile/decile data points are average values for the entire quintile/decile bracket, is it possible still to come up with a best fit? Is this what you meant by least squares fitting?

Best Answer

Here is the example of the quick and dirty R code to illustrate what Michael suggested:

Define quantiles available:

q<-c(0.1,0.2,0.4,0.6,0.8,0.9)

Create artificial data and add some noise

data <-jitter(qlnorm(q))

Create function to minimise

fitfun <- function(p)sum(abs(data-qlnorm(q,p[1],p[2])))

Run the optimiser with the initial guess of parameters of log-normal distribution:

opt <- optim(c(0.1,1.1))

The parameters fitted:

Display the fit visually:

aa<-seq(0,0.95,by=0.01)
plot(aa,qlnorm(aa,opt$par[1],opt$par[2]),type="l")
points(q,data)

enter image description here

Note, I intentionally plotted only 95%-quantile, since the log-normal distribution is unbounded, i.e. the 100%-quantile is infinity.

Usual caveats apply, real life example might look much uglier than this one, i.e. fit might be much worse. Also try Singh-Maddala distribution instead of log-normal, it works better for income distributions.

Best Answer

Related Solutions

Solved – Probabilities from lognormal distribution

Solved – Sampling from a lognormal distribution

Related Question