Solved – Generating Data from Arbitrary Distribution

data transformationdistributionsquantilesuniform distribution

If we want to generate a random sample according to a distribution $F$, we can generate a uniform random number on $(0,1)$ and invert it by $F$. This is due to the fact that, if $U$ is uniform on $(0,1)$, then $X=F^{-1}(U)$ is a random variable that follows $F$.

I know this is true as $P(X\le x) = P(F^{-1}(U)\le x) = P(U\le F(x)) = F(x)$.

Does anyone know the intuition behind this? What would have made someone to assert this? Any help is greatly appreciated.

Best Answer

This is known as inverse transform sampling. The idea is well encapsulated in the following picture from Wikipedia:

inverse transform sampling

Note that the image of the cumulative distribution function (CDF) $F_X$ is the interval $[0,1]$ on the $y$ axis. (Purists will discuss whether the endpoints should be included or not.) Also note that the CDF is of course monotone.

In inverse transform sampling, we sample uniformly from this image, i.e., $U[0,1]$. These are the dots on the $y$ axis. We then go right from these dots to the graph of $F_X$, then down to the $x$ axis. This is where the "inverse" comes in: because we start from the $y$ axis and end up on the $x$ axis.

The result on the $x$ axis is distributed according to $F_X$.

  • Where $F_X$ is steep (i.e., the density $f_X$ is large), $y$ values that are close together yield $x$ values that are close together. We get a high density of $x$ values.
  • Where $F_X$ is flat (i.e., $f_X$ is small), $y$ values that are close together yield $x$ values that are farther apart. We get a low density of $x$ values.