Sample from unknown distribution

probability distributions

enter image description here

Let's just assume on the picture we have a prob. distribution of aliens height on a Mars. We know for sure, the area below the curve is 1.

But I have a different question. How do we approach sampling from unique distributions as this is? Taking mean, variance, standard deviation out of it, etc, would not be really helpful, I guess, or it would be? Basically, I am just interested in how to sample from distributions, which are not well defined such as normal distribution etc.

Best Answer

You apply the mathematical inverse of the cumulative distribution function to numbers randomly sampled from a uniform distribution on the interval $[0,1]$.


Suppose for example you want to sample numbers from the exponential distribution which has a probability density function,

$$ f_X(x) = \frac{1}{\tau} e^{-x/\tau}\qquad (0\leq x ),$$

the cumulative distribution function is defined as,

$$ F(z) = P( X < x)$$ $$= \int_0^z f_X(x) dx $$ $$= \frac{1}{\tau} \int_0^z e^{-x/\tau} dx $$ $$= e^{-z/\tau} - 1 $$

Now we have that,

$$F(z) = 1 - e^{-z/\tau},$$

the mathematical inverse of this function is,

$$F^{-1}(z) =-\tau \log(1-z).$$

======

Now we will apply the method I described at the beginning of this answer to get numbers sampled from the exponential distribution.

First I need a source of uniformly random numbers on the interval from $[0,1]$. I will use random.org to generate these numbers.

https://www.random.org/decimal-fractions/

======

I generated 100 random number sampled from the uniform distribution on $[0,1]$ using the random.org link above. The histrogram from these numbers follows.

enter image description here

Then I applied $F^{-1}(z)$ to each of these numbers (I chose $\tau=1$). The resulting list of numbers obtained from this process obeys an exponential distribution. Their histogram is shown below.

enter image description here

You can see that the histogram has changed to have a shape consistent with an exponential distribution.