Solved – Generate random numbers from “sloped uniform distribution” from mathematical theory

distributionspythonrrandom-generationuniform distribution

For some purpose, I need to generate random numbers (data) from "sloped uniform" distribution. The "slope" of this distribution may vary in some reasonable interval, and then my distribution should change from uniform to triangular based on the slope. Here is my derivation:

Let's make it simple and generate data form $0$ to $B$ (blue, red is uniform distribution). To obtain the probability density function of blue line I need just the equation of that line. Thus:

$$f(x) = tg(\varphi)x + Y(0)$$

and since (picture):

\begin{align}
tg(\varphi) &= \frac{1/B – Y(0)}{B/2} \\[5pt]
Y(0) &= \frac{1}{B} – tg(\varphi)\frac{B}{2}
\end{align}

We have that:

$$f(x) = tg(\varphi)x + \left(\frac{1}{B} – tg(\varphi)\frac{B}{2} \right)$$

Since $f(x)$ is PDF, CDF equals:

$$F(x) = \frac{tg(\varphi)x^2}{2} + x\left(\frac{1}{B} – tg(\varphi)\frac{B}{2} \right)$$

Now let's make a data generator. The idea is, that if I'll fix $\varphi, B$, random numbers $x$ can be computed if I'll get numbers from $(0,1)$ from an uniform distribution as described here. Thus, if I need 100 random numbers from my distribution with fixed $\varphi, B$, then for any $t_i$ from uniform distribution $(0,1)$ there is $x_i$ from "sloped distribution", and $x$ can be computed as:

$$\frac{tg(\varphi)x_i^2}{2} + x_i\left(\frac{1}{B} – tg(\varphi)\frac{B}{2} \right) – t_i = 0$$

From this theory I made code in Python which looks like:

import numpy as np
import math
import random
def tan_choice():
    x = random.uniform(-math.pi/3, math.pi/3)
    tan = math.tan(x)
    return tan

def rand_shape_unif(N, B, tg_fi):
    res = []
    n = 0
    while N > n:
        c = random.uniform(0,1)
        a = tg_fi/2
        b = 1/B - (tg_fi*B)/2
        quadratic = np.poly1d([a,b,-c])
        rots = quadratic.roots
        rot = rots[(rots.imag == 0) & (rots.real >= 0) & (rots.real <= B)].real
        rot = float(rot)
        res.append(rot)
        n += 1
    return res

def rand_numb(N_, B_):
    tan_ = tan_choice()
    res = rand_shape_unif(N_, B_, tan_)
    return res

But the numbers generated from rand_numb are very close to zero or to B (Which I set as 25). There is no variance, when I generate 100 numbers, all of them are close to 25 or all are close to zero. In one run:

num = rand_numb(100, 25)
numb
Out[140]: 
[0.1063241766836174,
 0.011086243095907753,
 0.05690217839063588,
 0.08551031241199764,
 0.03411227661295121,
 0.10927087752739746,
 0.1173334720516189,
 0.14160616846114774,
 0.020124543145515768,
 0.10794924067959207]

So there must be something very wrong in my code. Can anyone help me with my derivation or code? I'm crazy about this now, I can't see any mistake. I suppose R code will give me similar results.

Best Answer

Your deriviation is ok. Note that to get a positive density on $(0,B)$, you have to constraint $$ B^2 \tan\phi < 2. $$ In your code $B = 25$ so you should take $\phi$ between $\pm\tan^{-1}{2\over 625}$, that's where your code fails.

You can (and should) avoid using a quadratic solver, and then select the roots between 0 and $B$. The quadratic polynomial equation in $x$ to be solved is $$F(x) = t$$ with $$ F(x) = {1\over 2} \tan \phi \cdot x^2 + \left( {1\over B} - {B\over 2} \tan \phi \right) x.$$ By construction $F(0) = 0$ and $F(B) = 1$; also $F$ increases on $(0,B)$.

From this it is easy to see that if $\tan \phi > 0$, the portion of parabola in which we are interested is a part of the right side of the parabola, and the root to keep is the highest of the two roots, that is $$ x = {1\over \tan \phi} \left( {B\over 2} \tan \phi - {1\over B} + \sqrt{ \left( {B\over 2} \tan \phi - {1\over B} \right)^2 + 2 \tan \phi \cdot t}. \right)$$ To the contrary, if $\tan\phi < 0$, the parabola is upside down, and we are interest in its left part. The root to keep is the lowest one. Taking into account the sign of $\tan\phi$ it appears that this is the same root (ie the one with $+\sqrt\Delta$) than in the first case.

Here is some R code.

phi <- pi/8; B <- 2
f <- function(t) (-(1/B - 0.5*B*tan(phi)) + 
       sqrt( (1/B - 0.5*B*tan(phi))**2 + 2 * tan(phi) * t))/tan(phi)
hist(f(runif(1e6)))

And with $\phi < 0$:

phi <- -pi/8
hist(f(runif(1e6)))

Related Solutions

Solved – Generating random samples from a custom distribution

It looks like you figured out that your code works, but @Aniko pointed out that you could improve its efficiency. Your biggest speed gain would probably come from pre-allocating memory for z so that you're not growing it inside a loop. Something like z <- rep(NA, nsamples) should do the trick. You may get a small speed gain from using vapply() (which specifies the returned variable type) instead of an explicit loop (there's a great SO question on the apply family).

> nsamples <- 1E5
> x <- runif(nsamples)
> f <- function(x, u) 1.5 * (x - (x^3) / 3) - u
> z <- c()
> 
> # original version
> system.time({
+ for (i in 1:nsamples) {
+   # find the root within (0,1) 
+   r <- uniroot(f, c(0,1), tol = 0.0001, u = x[i])$root
+   z <- c(z, r)
+ }
+ })
   user  system elapsed 
  49.88    0.00   50.54 
> 
> # original version with pre-allocation
> z.pre <- rep(NA, nsamples)
> system.time({
+ for (i in 1:nsamples) {
+   # find the root within (0,1) 
+   z.pre[i] <- uniroot(f, c(0,1), tol = 0.0001, u = x[i])$root
+   }
+ })
   user  system elapsed 
   7.55    0.01    7.78 
> 
> 
> 
> # my version with sapply
> my.uniroot <- function(x) uniroot(f, c(0, 1), tol = 0.0001, u = x)$root
> system.time({
+   r <- vapply(x, my.uniroot, numeric(1))
+ })
   user  system elapsed 
   6.61    0.02    6.74 
> 
> # same results
> head(z)
[1] 0.7803198 0.2860108 0.5153724 0.2479611 0.3451658 0.4682738
> head(z.pre)
[1] 0.7803198 0.2860108 0.5153724 0.2479611 0.3451658 0.4682738
> head(r)
[1] 0.7803198 0.2860108 0.5153724 0.2479611 0.3451658 0.4682738

And you don't need the ; at the end of each line (are you a MATLAB convert?).

Solved – How to generate numbers based on an arbitrary discrete distribution

One of the best algorithms for sampling from a discrete distribution is the alias method.

The alias method (efficiently) precomputes a two-dimensional data structure to partition a rectangle into areas proportional to the probabilities.

In this schematic from the referenced site, a rectangle of unit height has been partitioned into four kinds of regions--as differentiated by color--in the proportions $1/2$, $1/3$, $1/12$, and $1/12$, in order to sample repeatedly from a discrete distribution with these probabilities. The vertical strips have a constant (unit) width. Each is divided into just one or two pieces. The identities of the pieces and the locations of the vertical divisions are stored in tables accessible via the column index.

The table can be sampled in two simple steps (one for each coordinate) requiring generating just two independent uniform values and $O(1)$ calculation. This improves on the $O(\log(n))$ computation needed to invert the discrete CDF as described in other replies here.

Best Answer

Related Solutions

Solved – Generating random samples from a custom distribution

Solved – How to generate numbers based on an arbitrary discrete distribution

Related Question