Two methods for generating random numbers that sum to 1.

probability

I want a vector $\vec{v}$ of $N$ non-negative random numbers that sum to 1.

Let $X(a)$ be the (continuous) uniform distribution over interval $[0, a]$.

Let $S(n) = \sum_{i = 1}^{n} v_{i}$ be the partial sum of the elements of $\vec{v}$

Method 1

Generate: For each $k$, set $v_k$ to a random number from $X(1)$.
Normalize: Divide $\vec{v}$ by sum of all elements of $\vec{v}$.

Method 2

Generate the elements of $\vec{v}$ one after another with the following steps.

Generate 1st element: set $v_1$ to a random number from $X(1)$.
Generate 2nd element: set $v_2$ to a random number from $X(1 – v_{1})$

…
Generate the $k^{th}$ element: set $v_k$ to a random number from $X(1 – S(k-1))$.

…
Calculate the last element: set $v_N$ to $1 – S(N – 1)$.

Question

If I generate $\vec{v}$ with method 2, are the probability distributions of the elements of $\vec{v}$ independent from each other?

Thank you.

Best Answer

The answer is no, these will not be independent. Consider the example where $N = 2$, then your method 2 is equivalent to:

Choose $v_1 \sim \text{Unif}[0,1]$.
Set $v_2 = 1-v_1$.

Note that $v_2$ is itself uniformly distributed on $[0,1]$.

Now if $v_1,\,v_2$ were independent, we would have for all $s,t \in [0,1]$

$$ \mathbf P[ v_1 \leq s, v_2 \leq t] = \mathbf P[v_1 \leq s] \mathbf P[v_2 \leq t] = st.$$

However, since $v_2 = 1 - v_1$ we actually have

$$\mathbf P[v_1 \leq s, (1-v_1) \leq t] = \mathbf P[v_1 \leq s, v_1 \geq 1-t] = \mathbf P[(1-t) \leq v_1 \leq s].$$

The exact formula of the final expression depends on the values of $s,t \in [0,1]$, but as an example if $s = t = 1/2$ then

$$ \mathbf P[(1-t) \leq v_1 \leq s] =\mathbf P[ v_1 = 1/2] = 0 \neq \frac14, $$ where $\frac14$ is the answer you would expect for independent $v_1,v_2$.

Related Solutions

[Math] Generating Random Orthogonal Matrices

If you sample elements from a uniform distribtution over $[-1,1]$ and apply the Gram Schmidt procedure, you can generate every possible orthogonal matrix (note that orthogonal matrices necessarily have elements within $[-1,1]$). However, I don't believe that it will generate all matrices with equal probability.

See this paper for further discussion, and a method that produces a uniformly random unitary matrix.

Generating Standard Uniform random variable

Many statistical software programs use the 'Mersenne Twister' pseudorandom number generator. See this link for some information on this generator (the default in R) and other well vetted generators. More particularly, see Wikipedia on the Mersenne-Twister.

(There are no set rules for making a pseudorandom number generator, using congruential generators or otherwise. We know lots of things not to do, but no sure rules for success. Thus generators are tested using 'batteries' of problems that have proved difficult to simulate. A well-vetted generator is one that has passed many such tests.)

In R statistical software, the function runif samples the indicated number of observations from a uniform distribution. Thus the following R code samples $m = 10,000$ observations from $\mathsf{Exp}(\lambda = 3)$ according to the inverse CDF transformation shown by @peter 5 (+1).

set/seed(416)  # for reproducibility
m = 10^4;  u = runif(m)
x = -(1/3)*log(u)

Histograms of the vectors u and x, each containing $m$ elements, are shown below, along with the respective density functions.

par(mfrow=c(1,2))
hist(u, prob=T, col="skyblue2", main="Uniform Sample with Density of UNIF(0,1)")
 curve(dunif(x), add=T, n=10001, lwd=2, col="red")
hist(x, prob=T, col="skyblue2", main="Exponential Sample with Density of EXP(rate=3)")
 curve(dexp(x, 3), add=T, n=10001, lwd=2, col="red")
par(mfrow=c(1,1))

Notice that the plots show good agreement of the histograms of samples with the density functions of the respective distributions.

More formally, here are results of Kolmogorov-Smirnov goodness-of-fit tests based on the first 5000 observations in each sample (the samplie-size limit allowed by the implementation of this test in R). P-values far above 5% indicate that samples are consistent with the claimed distributions.

ks.test(u[1:5000], "punif")

        One-sample Kolmogorov-Smirnov test

data:  u[1:5000]
D = 0.013297, p-value = 0.3396
alternative hypothesis: two-sided


ks.test(x[1:5000], "pexp", 3)

        One-sample Kolmogorov-Smirnov test

data:  x[1:5000]
D = 0.013297, p-value = 0.3396
alternative hypothesis: two-sided

Best Answer

Related Solutions

[Math] Generating Random Orthogonal Matrices

Generating Standard Uniform random variable

Related Question