[Math] How to prove uniform distribution of $m\oplus k$ if $k$ is uniformly distributed

probabilityprobability distributions

All values $m, k, c$ are $n$-bit strings. $\oplus$ stands for the bitwise modulo-2 addition.

How to prove uniform distribution of $c=m\oplus k$ if $k$ is uniformly distributed? $m$ may be of any distribution and statistically independant of $k$.

For example you have a $m$-bit string that is with probability p=1 always '1111…111'. Adding it bitwise to a random $k$-bit string which is uniformly distributed makes the result also uniformly distributed. Why?

Best Answer

If $X$ is a (vector) random variable that takes on each of the $2^n$ values $$\{00\cdots 00, \quad 00\cdots 01,\quad 00\cdots 10,\quad \ldots\ , \quad 11\cdots 10,\quad 11\cdots 11\}$$ with equal probability $2^{-n}$, then for any fixed $n$-bit vector $m$, $Z = X\oplus m$ also has the same distribution since for all choices of $n$-bit vector $a$ we have that $$P\{Z = a\} = P\{m\oplus X = a\} =P\{X = m\oplus a\} = 2^{-n}\tag{1}$$ But suppose that $Y$ is another random variable taking on the same $2^n$ values with arbitrary distribution (including the possibility that $Y$ takes on some values with probability $0$) and suppose that $Z = X \oplus Y$. Given that the event $\{Y = m\}$ occurs, suppose that the conditional distribution of $X$ conditioned on $\{Y = m\}$ is uniform on the $2^n$ values of $X$. Then, we have that $$P\{Z = a\mid Y = m\} = P\{X\oplus Y = a\mid Y = m\} = P\{X = m\oplus a \mid Y = m\} = 2^{-n}. \tag{2}$$ Equation $(2)$ shows that $P\{Z = a \mid Y = m\}$, the conditional probability that $Z = X\oplus Y$ equals $a$ given that $Y = m$ is $2^{-n}$. If $(2)$ holds for all $m$, then since the unconditional probability that $Z = a$ is, by the law of total probability, a weighted sum of the conditional probabilities, we also have that $$P\{Z = a\} = \sum_m P\{Z=a\mid Y = m\}P\{Y = m\} = 2^{-n}\sum_m P\{Y = m\} = 2^{-n}.$$ Note that we have implicitly assumed that $X$ and $Y$ are independent because we have assumed that for all choices of $m$, the conditional distribution of $X$ given that $\{Y = m\}$ is a uniform distribution on the $2^n$ values of $X$, that is the conditional distribution of $X$ is the same regardless of the choice of the value of $Y$.

Two points are worth noting here.

  • It is not necessary that $Y$ also be uniformly distributed on the $2^n$ values as long as $X$ and $Y$ are assumed to be independent. In fact, as $(1)$ indicates, $Z = X\oplus Y$ is uniformly distributed on the $2^n$ even if $Y$ is a degenerate random variable taking on value $m$ with probability $1$. (Note that even in this case, $X$ and $Y$ are independent random variables nonetheless).

  • While $X$ and $Y$ being independent is sufficient for $Z$ to inherit the uniform distribution on the $2^n$ values that $X$ enjoys, independence of $X$ and $Y$ is not necessary either. For example, with $n=2$ and $X$ uniformly distributed on $\{00, 01, 10, 11\}$, consider the joint distribution given below: $$P\{X = 00, Y = 00\} = \frac{1}{4}, \quad P\{X = 11, Y = 00\}= \frac{1}{4},\\ P\{X = 01, Y = 11\} = \frac{1}{4}, \quad P\{X = 10, Y = 11\}= \frac{1}{4},$$ where $X$ and $Y$ are clearly not independent, $Y$ is not uniformly distributed on $\{00, 01, 10, 11\}$ (though it is uniformly distributed on $\{00, 11\}$), and yet $Z$ is uniformly distributed on $\{00, 01, 10, 11\}$ just as $X$ is.