Entropy of fair but correlated coin flips

entropyinformation theorymultiple integralprobabilityprobability distributions

Consider the joint distribution, $p(\xi_1,…\xi_N)$, with components defined as $\xi_i=\mathrm{sign}(x_i)$, with $(x_1,…,x_N)\sim\mathcal{N}(0,\Sigma)$ with
$
\Sigma_{ij}=\delta_{ij}+(1-\delta_{ij})\tilde{\rho}
$, i.e. all off-diagonal entries of $\Sigma$ are $\tilde{\rho}$. Since all components $x_i$ have unit variance (i.e. diagonal entries of $\Sigma$ are $1$), $\tilde{\rho}$ is then the correlation of any pair $(x_i,x_j)$. Note that

all single component marginals $p(\xi_i)=1/2$, i.e. the coins are unbiased;
we can always set the value of $\tilde{\rho}$ so that any pair $(\xi_i,\xi_j)$ has the desired correlation of $\rho$.

What is the formula for the entropy of $p(\xi_1,…\xi_N)$, denoted $H_N(\rho)$?

The parametrization of $\tilde{\rho}$ by $\rho$ depends on $N$ and is obtained by calculating the pair marginal probability $p(\xi_i=1,\xi_j=1)$, denoted $p_{11}$. Let $f_N(\tilde{\rho})$ be the function of $\tilde{\rho}$ that gives this probability. The formula for the correlation between two binary random variables,
$$
\rho=\frac{p_{11}-p_{-1}p_1}{\sqrt{p_{-1}(1-p_{-1})p_1(1-p_1)}}={4}p_{11}-1\;,
$$
with $p_{\pm1}$ denoting $p(\xi_i=\pm 1)=p(\xi_j=\pm 1)=1/2$, gives the desired parametrization,
$$
\tilde{\rho}_N(\rho):=f^{-1}_N\left(\frac{\rho+1}{4}\right)\;.\;\;\;\;(\mathrm{eq}.1)
$$

Solution for $N=2$:

Computing the 2D Gaussian integral using spherical coordinates gives $\tilde{\rho}_{N=2}(\rho)=\sin(\frac{\pi}{2}\rho)$. In this special case, $(\mathrm{eq}.1)$ and symmetry constraints completely specify the distribution $p(\xi_1,\xi_2)=(1+\xi_1\xi_2\rho)/4$. Calculation of the entropy from its definition gives

$$
H_{N=2}(\rho)=H(\xi_1,\xi_2)=\sum_{\eta=(1\pm\rho)/4}2\left(-\eta\log_2\eta\right).
$$

We have $H_{N=2}(0)=2$ bits and $H_{N=2}(1)=1$ bit. In fact, we know $H_N(\rho=0)=N$ and $H_N(\rho=1)=1$ for all $N$.

For $N>2$:

There are lots of Gaussian integrals to do. Due to the symmetry in the index permutations there are only $N$ distinct integral values among the $2^N$ terms in the entropy. They can be grouped by how many 1s they contain. The pair of all $-1$s can be grouped with the singleton group of all $1$s due to the reflective symmetry in the plane normal to the main diagonal. We just need compute the multiplicity and the integral for each of the $N$ terms.

E.g. for $N=2$ there are 2 terms (listed above with multiplicity 2). For $N=3$ there are three terms (two for tuples containing one and two 1s), the third being the extreme group having $(-1,-1,-1)$ and $(1,1,1)$ in binary notation.

In section 3.2 of this paper, Six gives a recursive solution to the distribution. This could be used to compute the entropy and the parametrization function $\tilde{\rho}_N(\rho)$.

Accepted Answer: In the end, an alternative approach based on expressing $x_i=\sqrt{1-\tilde{\rho}}y_i+\sqrt{\tilde{\rho}}s$, with $y_i,s\sim\mathcal{N}(0,1)$, seems to be more straightforward and is the accepted answer below. That answer also indicates that the $f_N(\tilde{\rho})$ does NOT seem to depend on $N$ after all.

Best Answer

Of course, the assumption $H(\xi_i|\xi_{i+1},...,\xi_N)=H(\xi_i|\xi_{i+1})$ seems wrong, even as an approximation. I doubt that there is a simple solution.

Here's at least some numerical results, from simulations. The horizontal axis is $\rho$, the vertical axis is the estimated entropy, normalized to $[0,1]$: $H'=(H-1)/(N-1) $

As you mention, the evaluation of the entropy amounts to integrate a multimensional gaussian (albeit zero mean and symmetric wrt the components) over all the quadrants. The answers to this question have some pointers. Eg this one

The most specific reference I found is this one .

Another interesting one: https://arxiv.org/abs/1009.2855

Added: from the same simulations, here are the conditional probabilities $H(\xi_i|\xi_{i-1},...,\xi_1)$ for diferent values of $\rho$. I wonder if there is non-zero limit for $N\to \infty$...

In case someone is interested, here's the Python code

import numpy as np
import math

def hb(p):
    return -p * math.log2(p) - (1 - p) * math.log2(1 - p) if p > 0 and p < 1 else 0

def calca(nn, r):
    return (math.sqrt((1 - nn) * r**2 + (nn - 2) * r + 1) - r + 1) / r

def calcrho(nn, aa):
    return (nn + 2 * aa) / ((aa + 1) ** 2 + nn - 1)

def gen1(nn, aa):
    y = np.random.normal(0, 1, nn)
    return (np.sum(y) * np.ones(nn) + aa * y) / math.sqrt((aa + 1) ** 2 + (nn - 1))

def tryn(nn, rho, t):
    aa = calca(nn, rho)
    Pk = np.zeros((nn + 1, nn + 1))
    PJk = np.zeros((nn + 1, nn + 1))  # joint prob
    PCk = np.zeros((nn + 1, nn + 1))  # cond prob
    for _ in range(t):
        y = gen1(nn, aa)
        z = np.array([1 if t > 0 else 0 for t in y])
        c = 0
        for k in range(nn):
            c += z[k - 1] if k > 0 else 0
            Pk[k][c] += 1
            if z[k] == 1:
                PJk[k][c] += 1
    # normalize probabilities
    for k in range(nn):
        Pk[k] /= t
        PJk[k] /= t
    for k in range(nn):
        for c in range(k + 1):
            PCk[k][c] = PJk[k][c] / Pk[k][c] if Pk[k][c] > 0 else 0
    r = calcrho(nn, aa)
    print(f"{nn=} {r=} {aa=} {t=}")
    # print(np.round(Pk[:5,:5],5))
    # print(np.round(PJk[:5,:5],5))
    # print(np.round(PCk,5))
    H = np.zeros(nn)
    for k in range(nn):
        H[k] = 0
        for c in range(k + 1):
             H[k] += hb(PCk[k][c]) * Pk[k][c]
    print('h', np.round(H, 5))

rho = 0.7
nn = 12
tryn(nn, rho, 800000)

Best Answer

Related Solutions

[Math] How to understand the exchangeable $\sigma$-algebra

Empirical distribution function

Related Question