Solved – Estimating a distribution based on three percentiles

quantilesrregression

What methods can I use to infer a distribution if I know only three percentiles?

For example, I know that in a certain data set, the fifth percentile is 8,135, the 50th percentile is 11,259, and the 95th percentile is 23,611. I want to be able to go from any other number to its percentile.

It's not my data, and those are all the statistics I have. It's clear that the distribution isn't normal. The only other information I have is that this data represents government per-capita funding for different school districts.

I know enough about statistics to know that this problem has no definite solution, but not enough to know how to go about finding good guesses.

Would a lognormal distribution be appropriate? What tools can I use to perform the regression (or do I need to do it myself)?

Best Answer

Using a purely statistical method to do this work will provide absolutely no additional information about the distribution of school spending: the result will merely reflect an arbitrary choice of algorithm.

You need more data.

This is easy to come by: use data from previous years, from comparable districts, whatever. For example, federal spending on 14866 school districts in 2008 is available from the Census site. It shows that across the country, total per-capita (enrolled) federal revenues were approximately lognormally distributed, but breaking it down by state shows substantial variation (e.g., log spending in Alaska has negative skew while log spending in Colorado has strong positive skew). Use those data to characterize the likely form of distribution and then fit your quantiles to that form.

If you're even close to the right distributional form, then you should be able to reproduce the quantiles accurately by fitting one or at most two parameters. The best technique for finding the fit will depend on what distributional form you use, but--far more importantly--it will depend on what you intend to use the results for. Do you need to estimate an average spending amount? Upper and lower limits on spending? Whatever it is, you want to adopt some measure of goodness of fit that will give you the best chance of making good decisions with your results. For example, if your interest is focused in the upper 10% of all spending, you will want to fit the 95th percentile accurately and you might care little about fitting the 5th percentile. No sophisticated fitting technique will make these considerations for you.

Of course no one can legitimately guarantee that this data-informed, decision-oriented method will perform any better (or any worse) than some statistical recipe, but--unlike a purely statistical approach--this method has a basis grounded in reality, with a focus on your needs, giving it some credibility and defense against criticism.

Related Question