Minimal possible entropy of 4 pairwise independent unbiased Bernoulli random variables.

information theoryoptimizationprobability

Suppose we have 4 pairwise independent unbiased Bernoulli random variables $X_1, X_2, X_3, X_4$ (i.e., each of them takes two values, each value with probability $1/2$). What is the minimal possible value of the Shannon entropy $H(X_1 X_2 X_3 X_4)$?

The minimum is at most 3, because we can take $X_1 = A,\, X_2 = B,\, X_3 = A\oplus B,\, X_4 = C$, where $A, B, C$ are three mutually independent random bits.

On the other hand the following generic lower bound is known:
$$H(X_1\ldots X_n) \ge \log_2(n + 1)$$
whenever $X_1, \ldots, X_n$ are pairwise independent unbiased Bernoulli random variables [Gavinsky, D., & Pudlák, P. (2016). On the joint entropy of $ d $-wise-independent variables. Arxiv: https://arxiv.org/abs/1503.08154 ] This is tight when $n$ is of the form $2^{k} – 1$. However $n = 4$ is not of this form so the only thing we can get from here is that the minimum is in $[\log_2(5), 3]$.

Best Answer

A skeleton of a possible way to proceed to solve this problem :

Your constraints are linear. The joint distribution for 4 Bernouilli variables has 15 free dimensions.

You add 10 linear constraints. (you need to demonstrate that they are all independent)

It remains 5 free dimensions.

You add 16 linear inequalities (each dimension is > 0)

So the domain is a polytope.

Entropy is convex. So, minimal entropy is reached on one of the vertices of the polytope. You have to enumerate the vertices of the polytope and evaluate the entropy on each of them.

In the past, I had come upon a very similar problem of vertices enumeration in a constrained probabilistic system. The idea was to find a first vertex and to find neighbor vertices by moving over the ridges. A vertex is characterized by a number N of coordinates locked to 0 and a ridge is a segment where only N-1 coordinates are equal to 0. You "explore" the ridge by relaxing one of the N locked coordinates and continue to abide by the linear constraints. You imagine that your relaxed coordinate increases by $\delta$ and your constraints force all other unlocked coordinates to increase or decrease by $\delta$ too. Once one of the decreasing coordinate reach 0, you lock it and have found a new vertex.

One slight difficulty in your case is that with symmetry, you have several coordinates that reach 0 at the same time. I think a good idea to deal with it is to distinguish a locked coordinate from a coordinate equal to 0. If you characterize your vertex by locked coordinates, you have neighbor vertices that are actually the same point but are different from the point of view of locked coordinates.

You have a first vertex (the distribution you refer to with entropy=3). It has 8 coordinates equal to 0. You have to lock 5 of them and proceed with the described algorithm.

I hope it is helpful.

Related Question