Solved – Common Continuous Distributions with [0,1] support

continuous datadistributions

Question

I am looking to understand what possible common statistical continuous distributions exist with support [0,1].

Background

In my work I often come across data which are bounded between 0 and 1 (both inclusive) and likely skewed to the right.

This data mainly consist of sales converted into percentages between 0 and 1, by either calculating total per cent of sales or conversion (sales divided by page views).

As I am not very proficient in statistics, I always struggle to find the best distribution to explain this data.

Best Answer

Wikipedia has a list of distributions supported on an interval

Leaving aside mixtures and 0-inflated and 0-1 inflated cases (though you should definitely be aware of all of those if you model data on the unit interval), which ones are common would be hard to establish (it will vary across application areas for example), but the beta family, and the triangular, and the truncated normal would probably be the main candidates as they seem to be used in a variety of situations.

Each of them can be defined on (0,1) and can be skewed either direction.

One example of each is shown here:

plot of density function for a particular member of each of the mentioned distributions, in each case mildly right skew

That they're often used doesn't imply they'll be suitable for whatever situation you're in, though. Model choice should be based on a number of considerations, but where possible, theoretical understanding and practical subject area knowledge are both important.

I always struggle to find the best distribution to explain this data.

You should get away from worrying about "best", and focus on "sufficient/adequate for the present purpose". No simple distribution such as the ones I mentioned will really be a perfect description of real data ("all models are wrong..."), and what might be fine for one purpose ("... some are useful") may be inadequate for some other purpose.


Edit to address information in comments:

If you have exact zeros (or exact ones, or both), then you will need to model the probability of those 0's and use a mixture distribution (a 0-inflated distribution if you can have exact 0's) -- shouldn't use a continuous distribution.

It's not really all that hard to deal with simple mixtures. You'll no longer have a density but the cdf is not much more effort to write down or evaluate than it would be in the continuous case; similarly quantiles are not much more effort either; means and variances are almost as readily calculated as before; and they're easy to simulate from.

Taking an existing continuous distribution on the unit interval and adding a proportion of zeros (and/or ones) is on the whole a pretty convenient way to model proportions that are mostly continuous but can be 0 or 1.

Related Question