[Math] Generating random numbers with a skewed distribution with many possible values

probabilityprobability distributionsprobability theoryrandomstatistics

Possible duplicate, but not sure: Generating random numbers with skewed distribution

So, I would like to generate integers bewteen $x$ and $y$ with a skewed distribution of $n$. What is the best way to do this?

I believe it is different to the link above, because it only chooses between two numbers, $1$ and $46$, but I want a larger range (i.e. between 1-100), but I don't know how.

Best Answer

Just to get you started:

Let $X$ have density function $f(x) = 2(1-x),$ for $0 < x < 1.$. That is a right-skewed continuous distribution taking values in $(0,1).$ The distribution of $X$ is called $\mathsf{Beta}(1,2);$ you can look at the Wikipedia article to see the general form of the family of distributions.

Then $Y = 100X$ is a continuous distribution taking values in $(0,100).$ You can use the ceiling function to get a discrete distribution taking integer values $1, 2, \dots, 100.$

How to generate values of $X:$ The CDF of $X$ is $F(x) = (1-x)^2,$ for $0 < x < 1.$ Suppose you have a pseudorandom generator that essentially give you random observations from $U \sim \mathsf{UNIF}(0,1).$ Then you can set $U = F(X)$ and solve for $X$ in terms of $U$ to get $X = F^{-1}(U) = 1 - \sqrt{U}.$

In R statistical software the function runif(10^6) generates a vector of a million observations that are essentially sampled from $\mathsf{Unif}(0,1)$ so the program below generates and plots a histogram of a million pseudo-realizations of $X.$ [In R this could be done more easily by using the built-in function for generating beta random variables: rbeta(m, 1, 2).]

m = 10^6;  u = runif(m)
x = 1 - sqrt(u)          

par(mfrow=c(1,2))  # enables 2-panel plots
 hist(u, prob=T, col="skyblue2", main="Realizations of UNIF(0,1)")
   curve(dunif(x), -.2, 1.2, col="blue", lwd=2, n=1001, add=T)
 hist(x, prob=T, col="skyblue2", main="Realizations of BETA(1,2)")
   curve(dbeta(x,1,2), -.2, 1.2, col="blue", lwd=2, n=1001, add=T)
par(mfrow=c(1,1))  # returns to single panel plots

enter image description here

Then a discrete distribution can be obtained by taking ceilings, as mentioned above. For a graph that will display clearly at the resulution available here, I have used values 1 through 20 (instead of 1 through 100).

k = 20
y = ceiling(k*x)
cutp=seq(0, k, by=.1)
hist(y, prob=T, br=cutp, main="Right-Skewed Discrete Distribution")

enter image description here

If you want a more extreme skew, you can use a different member of the beta family of distributions. [But then the relationship to uniform random variables is not so transparent.]

m = 10^6;  x = rbeta(m, 1, 4);  y = ceiling(20*x);  cutp=seq(0, 20, by=.1)
hist(y, prob=T, br=cutp, main="Right-Skewed Discrete Distribution")

enter image description here

Related Question