What does “less random” mean

probabilityrandom

Reading about random shuffle algorithms, I sometimes see the "randomness" of the method being discussed, and described as "less random" than a different algorithm.

For example, if I take a random process like a dice roll and modify it so that no number can appear twice (or more) in a row, one would say this process is "less random". Is this so because the probability of each outcome depends on the previous result? Moreover, if I have a perfectly shuffled deck of cards, does it mean taking a card from the deck is less and less random (since I cannot take a card I already got)? If so, how much less is it random?

Still, if I sum two numbers from two independent dice rolls, the result tends to be around the middle of the interval, but should still be totally random (i.e. independent on previous results). Still, one would describe the outcome as being not so random, since it prefers certain results over others.

Is the amount of randomness described only in terms of uniformity of the distribution and dependency on previous values, or is it something else? How can we measure how much "random" something is?

Best Answer

Often what people appear to mean when they informally say that something is "more random", they mean that the distribution has higher entropy. If $X$ is a discrete random variable, taking values in the set $\mathcal X$, the entropy $H(X)$ of $X$ is defined to be $$ H(X) = - \sum_{x \in \mathcal X} \mathbb P(X = x) \times \log(\mathbb P(X = x)), $$ where it is understood that $0 \times \log(0) = 0$.

For example, if $\mathcal X$ is finite, then the random variable that maximizes the entropy is the uniform distribution. This one is "most random". The random variable that minimizes the entropy is the one that takes a single value with probability 1: it has an entropy of 0, which is the "least random".

Consider your example of the sum of two die rolls. There are 11 possible outcomes, with probabilities $\frac1{36}, \frac1{18}, \frac1{12}, \frac19, \frac5{36}, \frac16, \frac5{36}, \frac19, \frac1{12}, \frac1{18}, \frac1{36}$. The entropy of this distribution is approximately $1.284$. If we instead consider the entropy of the uniform distribution on these 11 outcomes -- so we have the probability $\frac1{11}$ eleven times -- then the entropy is $$ - \sum_{i=1}^{11} \frac1{11} \log(1/11) = \log(11) \approx 3.298. $$

Drawing subsequent cards from the deck also gets less random. Initially there are 52 outcomes, which gives entropy $\log(52)$. After drawing one card, the entropy goes down to $\log(51)$ -- there are now 51 equally likely outcomes. And so on.

A die that cannot roll the same digit twice in a row also has lower entropy. Consider, for example, the space of all possible 5 roll sequences. On a standard die, this space has $6^5$ elements, and all are equally likely, for a total entropy of $5\log(6)$. If the same roll cannot appear twice in a row, the space only consists of $6 \times 5^4$ elements -- even if they are all equally likely the associated entropy is only $\log(6) + 4 \log(5)$.

Related Question