Probability – Existence of a Joint Distribution on Bernoulli Variables with Same Probability of Being Pairwise Different

coding-theoryit.information-theorypr.probabilityprobability distributions

Let $m\in\mathbb{N}$ and $p\in(0,1)$ be arbitrary. Is there a sequence $X_1,\dots,X_m$ of random variables with the following specs on their distribution:

  • Each $X_i$ is unbiased Bernoulli: $X_i\sim {\rm Ber}(1/2)$ for $1\le i\le m$.
  • $\mathbb{P}\bigl[X_i=X_j\bigr] = p$ for all $1\le i<j\le m$.

Best Answer

$\newcommand{\R}{\mathbb R}\renewcommand{\le}{\leqslant}\renewcommand{\ge}{\geqslant}$Let $n:=m\ge2$. We will show that the desired condition holds if and only if $$p\ge\frac{\lceil n/2\rceil-1}{2 \lceil n/2\rceil-1}.$$

Suppose for a moment that there exist random variables $X_1,\dots,X_n$ such that $X_i\sim {\rm Ber}(1/2)$ for all $i$ and $P(X_i=X_j)=p\in(0,1)$ if $i\ne j$. That is, there is a nonnegative function $g\colon\{0,1\}^n\to\R$ such that

(i) $\sum_{x\in\{0,1\}^n}g(x)=1$,

(ii) $\sum_{x\in\{0,1\}^n}1(x_i=0)g(x)=\frac12$ for all $i$,

(iii) $\sum_{x\in\{0,1\}^n}1(x_i=x_j)g(x)=p$ if $i\ne j$.

Then, by symmetry, conditions (i)--(iii) will hold with $\tilde g(x):=\frac1{n!}\sum_{\pi\in S_n}g(\pi(x))$ in place of $g(x)$, where $S_n$ is the set of all permutations of the set $[n]:=\{1,\dots,n\}$. Note that $\tilde g(x)=f(\sum_1^n x_i)$ for some function $f\colon\{0,\dots,n\}\to\R$ and all $x=(x_1,\dots,x_n)\in\{0,1\}^n$. Moreover, the conditions (i)--(iii) can be rewritten as

(I) $\sum_{k=0}^n \binom nk f(k)=1$,

(II) $\sum_{k=0}^n \binom{n-1}k f(k)=\frac12$ for all $i$,

(III) $\sum_{k=0}^n a_{n,k} f(k)=p$,

where \begin{equation} a_{n,k}=\binom{n-2}k+\binom{n-2}{k-2}; \end{equation} of course, $\binom{n-2}k=0$ if $k\ge n-1$ and $\binom{n-2}{k-2}=0$ if $k\le1$.

Thus, for any given $n\ge2$ and $p\in(0,1)$, we want to see whether there is a nonnegative function $f\colon\{0,\dots,n\}\to\R$ such that conditions (I)--(III) hold.

Consider the ratios \begin{equation} r_k:= r_{n,k}:=\frac{a_{n,k}}{\binom nk} =\frac{(n-k)(n-k-1)+k (k-1)}{n (n-1)}. \end{equation} Note that $r_{k+1}\le r_k$ if $0\le k\le\frac{n-1}2$ and $r_{k+1}\ge r_k$ if $\frac{n-1}2\le k\le n-1$. Also, $r_k=r_{n-k}$.

It follows that

  • the maximum, say $\overline p_n$, of $\sum_{k=0}^n a_{n,k} f(k)$ over all nonnegative functions $f\colon\{0,\dots,n\}\to\R$ such that condition (I) holds is attained if $f(0)=f(n)=\frac12$ and $f_1=\cdots=f_{n-1}=0$, and hence $\overline p_n=1$; moreover, then condition (II) happens to hold as well;

  • if $n=2m$ is even, then the minimum, say $\underline p_n$, of $\sum_{k=0}^n a_{n,k} f(k)$ over all nonnegative functions $f\colon\{0,\dots,n\}\to\R$ such that condition (I) holds is attained if $f(m)=1/\binom nm=1/\binom{2m}m$ and $f_k=0$ for $k\in\{0,\dots,n\}\setminus\{m\}$, and hence $\underline p_n=\underline p_{2m}=\frac{m-1}{2m-1}$; moreover, then condition (II) happens to hold as well;

  • if $n=2m+1$ is odd, then the minimum $\underline p_n$ of $\sum_{k=0}^n a_{n,k} f(k)$ over all nonnegative functions $f\colon\{0,\dots,n\}\to\R$ such that condition (I) holds is attained if $f(m)=f(m+1)=\frac12/\binom{2m+1}m=\frac12/\binom{2m+1}{m+1}$ and $f_k=0$ for $k\in\{0,\dots,n\}\setminus\{m,m+1\}$, and hence $\underline p_n=\underline p_{2m+1}=\frac m{2m+1}$; moreover, then condition (II) happens to hold as well.

So,

  • the maximum of $\sum_{k=0}^n a_{n,k} f(k)$ over all nonnegative functions $f\colon\{0,\dots,n\}\to\R$ such that conditions (I) and (II) hold is $\overline p_n=1$;

  • if $n=2m$ is even, then the minimum of $\sum_{k=0}^n a_{n,k} f(k)$ over all nonnegative functions $f\colon\{0,\dots,n\}\to\R$ such that conditions (I) and (II) hold is $\underline p_n=\frac{m-1}{2m-1}$;

  • if $n=2m+1$ is odd, then the minimum of $\sum_{k=0}^n a_{n,k} f(k)$ over all nonnegative functions $f\colon\{0,\dots,n\}\to\R$ such that conditions (I) and (II) hold is $\underline p_n=\frac m{2m+1}$.

The set of all values of $\sum_{k=0}^n a_{n,k} f(k)$, where $f\colon\{0,\dots,n\}\to\R$ is a nonnegative function such that conditions (I) and (II) hold, is convex and therefore coincides with the interval $[\underline p_n,\overline p_n]=[\underline p_n,1]$.

Thus, the following two conditions are equivalent to each other:

  • there is a nonnegative function $f\colon\{0,\dots,n\}\to\R$ such that conditions (I)--(III) hold;

  • $\underline p_n\le p\le1$.

That is, for each natural $n\ge2$ and each $p\in[0,1]$, the following two conditions are equivalent to each other:

  • there exist random variables $X_1,\dots,X_n$ such that $X_i\sim {\rm Ber}(1/2)$ for all $i$ and $P(X_i=X_j)=p$ if $i\ne j$;

  • $p\ge \underline p_n$.

One may also note that

  • $\underline p_n=\dfrac{m_n-1}{2m_n-1}$, where $m_n:=\lceil n/2\rceil$;

  • $\underline p_n\le\underline p_{n+1}<\frac12$ for all $n\ge2$;

  • $\underline p_n\to\frac12$ as $n\to\infty$.

Related Question