Decomposition of a random variable into independent random variables

information theoryprobabilityprobability theoryrandom variables

Let us consider two random variables $X$ and $Y$ that are really random:
$$
H(X) > 0,\\
H(Y) > 0,
$$

where $H(\cdot)$ denotes the Shannon entropy.

Suppose that a realization of $X$ only has a part of information of $Y$, as follows:
$$
H(X\mid Y) = 0.
$$

Then, is there a random variable $Z$ responsible for only holding the rest of information of $Y$?

In particular, I would like to know whether there is a random variable $Z$ satisfying the following properties.

  • The total amount of information of $(X,Z)$'s realization is equivalent to $Y$'s realization: $$H(Y)=H(Z)+H(X);$$
  • Once realizations of $X$ and $Z$ are observed, a observer knows all about $Y$'s realization:
    $$H(Y\mid X,Z) = 0,$$
    and vice versa:
    $$H(X,Z\mid Y) = 0.$$

(added 2022/12/20)
The above-stated conditions can be visually rephrased into the following I-diagram.

I-diagram

Here, $X$ and $Y$ share the whole regions covered by the smaller and larger circle. The region of $Z$ is the larger circle not covered by the smaller circle.


If this is a famous fact, I'd also like to know the name of this fact like the name of theorem in order to learn more by googling.


Example

Suppose $Y$ is a two-bit random variable and $X$ is the bit of either side of those bits. Then, the above properties are clearly satisfied by $Z$ that is the other side of $Y$'s bit.

Concretely, assume that we have two coins, say, 1 cent and 5 cent coins.
$Y$ represents an event of tossing those two coins, and $X$ expresses a result of tossing the 1 cent coin. if $Z$ represents the tossing result of the 10 cent coin, $Z$ only holds the rest of the $Y$'s information not covered by $X$.

Best Answer

This is not true in general. For a counterexample, take $Y \in \{1,2,3,4\}$ with the pmf $(1/6, 1/3, 1/4, 1/4)$. Let $f(1) = f(2) = a, f(3) = f(4) = b$ and set $X = f(Y).$

Now suppose that an appropriate $Z$ existed. There exist two deterministic functions $g,h$ such that $g(Y) = Z$ and $h(X,Z) = Y$ - the former to make $H(Z|Y) = 0,$ and the latter to make $H(Y|X,Z) = 0$. Note further that it must hold that $X$ and $Z$ are independent: $$ H(X,Z|Y) = 0 \implies H(X,Y,Z) = H(Y) = H(X) + H(Z), \\ H(Y|X,Z) = 0 \implies H(X,Y,Z) = H(X,Z),\\ H(X) + H(Z) = H(X,Z) \iff I(X;Z) = 0.$$

Since $Z = g(Y),$ it can take at most $4$ distinct values - call these $\{g_1, g_2, g_3, g_4\},$ where $g_i = g(i)$. However, there are further constraints, e.g., it must be the case that $h(a, g_i) \in \{1,2\}.$ So, we have that $$ g_i = g( h ( a,g_i)) \in \{g_1, g_2\} \implies Z \in \{g_1, g_2\}|X = a\\ g_i = g( h ( b, g_i)) \in \{g_3,g_4\} \implies Z \in \{g_3, g_4\}|X = b.$$ But $Z$ is independent of $X$, so the support of $Z$ cannot change upon conditioning on the value of $X$. We conclude thus that $Z$ has binary support, i.e., $\{g(3), g(4)\} \subseteq \{g_1, g_2\}.$ But this in turn means that the function $h(x,z)$ must be injective, otherwise all $4$ values of $Y$ could not be attained.

But notice that this forces $H(Y|X = a) = H(Y|X = b)$. Indeed, we have $$ H(Y|X = a) = H( h(a,Z)|X = a) = H( h(a,Z)) = H(Z)\\ H(Y|X = b) = H( h(b,Z)|X = b) = H(h(b,Z)) = H(Z),$$ where we use the independence of $X,Z$, and the final equalities are using the injectivity of $h$. But obviously these two conditional entropies are not equal in the law we started out with, giving us a contradiction.


The above tells us that one needs to relax at least one of the two conditions $H(Y|X,Z) = 0, H(Z|Y) = 0.$ Weakening the latter to just independence of $X$ and $Z$ already yields fruit: for any $(X,Y)$ (without even the determinism of $X|Y$), there exists $Z$ independent of $X$ such that $Y = h(X,Z)$ for a deterministic map $h.$ This result is called the functional representation lemma, and recently Li and El Gamal showed a strong version of this with applications in one-shot information theory https://arxiv.org/abs/1701.02827 . Here the strength is an explicit control of the form $I(X;Z|Y) \le 4 + \log(1 + I(X;Y)),$ which yields bounds on $H(Y|Z)$ (but not on $H(Z|Y),$ or at least not without thought). There might also be different relaxations that start getting to things like common information or coordination (although its been a while since I've read this work, so not sure).

Related Question