Probability of a random set of binary numbers XORing to $0$

binaryprobability

Question – Given a randomly generated set of binary numbers, what is the probability that all of them, when XORed with each other, yield $0$?

Additional information –

  1. All of the binary numbers in the set are different from each other. As an example, $\{0001, 0010, 0011\}$ is a valid set, but not $\{0001, 0010, 0010\}$ or $\{0010, 0010, 0010\}$.

  2. All of the binary numbers have equal length $n$ which is some given. For example, one could have a given length of 4, meaning numbers such as $00001$ and $010$ are not allowed.

  3. The length of the set $m$ is a given, its members are randomly generated.

  4. Each possible valid binary number has an equal likelihood of being generated.

Context – For a school math project, I decided to analytically investigate the average percentage of errors an error correction code can correct given a data set and error rate. This question I am asking was a part of said project that I am unable to solve. If necessary, I can provide additional context.

I already know how to do this problem if restriction 1 is lifted, but did not really know how to proceed. I visualized the randomly generated data set as being one number stacked on top of another, kind of how we write when adding large numbers on paper by hand, and then taking my answer to be the probability that each column of bits had an even number of $1$s.

Updated context – Looking at the answers already provided and doing some of the math myself, it is clear that using this in my project will take me well over the page count in all of the explanations. I have come up with a simpler path for my investigation, but am leaving this question up here as I am still interested in this problem and I see no reason to stop anyone interested in answering or seeing answers to this question from doing so.

1rst edit – I have assigned $m$ as the length of the set and $n$ as the length of the binary sequences

2nd edit – I have added additional context

Best Answer

This isn't a quick solution, so others may improve on this; however, it is a good start.

You can use conditional probability. Let's say you want to generate $n$ random $k$-bit binary numbers. Let $B \in \{0, 1\}^{n \times k}$ be a random matrix where $b_{ij} = 1$ if the $j$th bit of the $i$th number is $1$, otherwise $b_{ij} = 0$. Let $P \in [0, 1]^{n \times k}$ represent a matrix, where $p_{ij} = \mathbb{P}(b_{ij} = 1)$. The first row of $P$ is $0.5$ because of condition $(4)$ from your question. From the second row on, you need to rely on conditional probabilities, e.g.:

$$p_{21} = \mathbb{P}(b_{21} = 1) = \mathbb{P}(b_{21} = 1 | b_{11} = 1)p_{11} + \mathbb{P}(b_{21} = 1 | b_{11} = 0)(1 - p_{11})$$

In the case of $p_{21}$, the probability doesn't depend on $b_{1i}$ for $i > 1$. This isn't true for all the entries, so you could condition on all the previous variables, it doesn't hurt, but a bit more tedious to write it out.

You can write down expressions for the conditional probabilities, e.g.:

$$\mathbb{P}(b_{21} = 1 | b_{11} = 1) = \frac{2 ^ {k - 1} - 1}{2 ^ k - 1}$$

This is because you already selected a $k$-bit number for the first row, so there are $2 ^ k - 1$ numbers left. Of those, there are $2 ^ {k - 1}$ which start with a zero and ${2 ^ {k - 1} - 1}$ which start with a one, because the condition is $b_{11} = 1$.

Probably, you can come up with a general formula for the entries of $P$, I haven't checked this, though.

Once you have $P$ defined, you can calculate the probability of the $i$th column XOR'ing to zero. Take an even-length combination of row indices, multiple those entries in $P$ together with $1 - P$ for the indices you didn't select. If you add up the probabilities for all these combinations, you'll get the probability of the $i$th column XOR'ing to zero.

The difficulty of this problem is that there is a non-zero covariance between the rows as they have to be unique. Also consider whether the first restriction is meaningful, you can start with a valid set $\{0100, 0010, 0110\}$ and XOR'ing the first two element creates an invalid set $\{0110, 0110\}$ but doesn't actually affect the final outcome of the whole set XOR'ing to zero. You can go the other way around as well, you can start with an invalid set $\{0110, 0110\}$ and convert it by expanding to $\{0100, 0010, 0110\}$. There can be cases when this expansion leads to invalid sets again. Those map to problems where you have too many numbers selected, for example, for 3-bit numbers, you can select at most 8 unique numbers, selecting the 9th will create an invalid set.