Combinatorics – Smallest Set Covering B^n with One XOR Operation

binarycombinatoricsextremal-combinatorics

More specifically, what's the smallest set of $n$-bit binary numbers, $X\subset \mathbb{B}^n$, such that every $n$-bit binary number can be obtained from $X$ by at most one XOR operation. Equivalently, the smallest set $X\subset \mathbb{F}_2^n$ such that for any $y\in \mathbb{F}_2^n$, either $y\in X$ or there exists $x_1, x_2\in X$ such that $y=x_1+x_2$.

For $n=4$, the set
$$X=\{1000, 0100, 0010, 0001, 1111\}$$
is smallest.

I'd like to know what the smallest such $X$ is for $n=10$.

For my own attempts, I tried at least to bound the size of $X$ from below. Since if $|X|=k$, then we can get at most $k+\binom{k}{2}=\frac{k^2+k}{2}$ distinct values from at most one XOR operation (or any binary operation for that matter). And since these values must cover all of $\mathbb{B}^n$, there must be at least $2^n$ of them. Well, $2^n-1$ actually; this is being used for a data encoding experiment and we don't have to bother about generating $\vec{0}$ from $X$. Working through the scratch, I found that
$$k\ge\frac{-1+\sqrt{2^{n+3}-7}}{2}$$
which tells us that $k\ge 5$ for $n=4$ and thus the set above is smallest.

For $n=5$ the corresponding bound is $|X|=k\ge 8$ but with a brute-force search, the only solution I found was with $|X|=9$. In particular,
$$X=\{10000, 01000, 00100, 00010, 00001, 11111, 11000, 10100, 01100\}$$

For $n=10$, the case I'm interested in, the bound is $|X|\ge 45$ but computer searches only yielded a solution with $|X|=65$.

So I'm wondering, what can I do to find such sets other than brute search? One attempt that stumped me was to start by duplicating some of the binary numbers from the $n=5$ solution as follows:
$$1000010000, 0100010000, 0010010000, 0001010000, 0000110000, 1111110000,$$
$$1000001000, 0100001000, 0010001000, 0001001000, 0000101000, 1111101000,$$
$$…$$
$$1000011111, 0100011111, 0010011111, 0001011111, 0000111111, 1111111111.$$
So this gives us a starting point of 36 binary which carry over some of the properties of the $n=5$ solution. But even these have some redundancy that the smaller solution didn't, like
$$1000010000+0100001000=1000001000+0100010000=1100011000$$
And obviously, if we're trying to get as many distinct numbers as possible out of $X$, we want to minimize pairs like this with the same sum. But even then, I wasn't sure how to add more elements to $X$ here. Another thought I had was to start with 11 elements in $X$:
$$1000000000, 0100000000, 0010000000, …, 0000000010, 0000000001, 1111111111$$
This would take care of all 1-token, 2-token, 9-token, and 10-token binary numbers (as I've been calling them). My hope from here was that 4-token numbers could be added in strategically. They could generate all 8-token codes codes by pairing with each other, all 6 token codes by pairing with $1111111111$, all 3-token and 5-token codes by pairing with the 1-token codes. But this would still miss 7-token codes entirely.

I don't need the optimal solution; even knocking that $|X|=65$ down to a 60 or a 55 would be great.

Best Answer

I’m afraid I have nothing analytical to contribute to this, and I don’t see a way to improve on your theoretical bound of $45$, but I did considerably reduce the size of the set required for $n=10$. Here’s a set of $50$ $10$-bit numbers which, if I understood the task correctly, generate all $10$-bit numbers with at most one XOR:

1010110111 1001110001 1011011000 0111010101 1110101010
0010101111 1010111101 1100100011 0000101000 1110111001
0011100010 1011111001 1101000101 0000000001 1011111011
0100011011 1001101000 0111101000 0010100101 1001000110
1100100001 1000011101 0110101101 0001111101 1100101000
0011100001 0100111000 0110111001 1111000110 0000111000
0111101110 1001011101 0000001110 1111011110 1001011100
0000101001 0100111011 1100000010 1000000011 1110010110
1000001111 1100001010 0110001001 1111010011 0011000001
0000111111 0001100010 1001110101 0001011000 1001110100

Here’s the simulated annealing code I used to find it.

The code easily finds solutions up to $k=53$. It had a hard time finding one for $k=52$; I had to use a very slow annealing schedule for that. I then took the solution for $k=52$, looked for the $51$-element subsets that generated the most numbers and used those as starting points for runs with $k=51$. That quickly yielded a solution, and the same approach worked to get from $51$ to $50$. It doesn’t work beyond that – the best $49$-element subsets of this set generate $1015$ numbers, and I can’t improve that beyond $1017$. I’m still running searches for $k=49$, but I doubt they’ll succeed. In any case I’m pretty sure the minimum is closer to $50$ than to $45$, since it starts to lie significantly above your theoretical bound already at lower $n$, where annealing is probably able to find the optimum – for instance, for $n=7$ the best set I can find has $19$ elements, while your bound is less than $16$.