Entropy of distribution with block matrix support

entropyinformation theoryprobability

Let $P(X_1,X_2)$ be a discrete bivariate distribution that has the form shown in the figure below, i.e. its support can be split into blocks that do not overlap on either dimensions.

Block structure of P(X_1,X_2)

Let's build $P'(B_1,B_2)$ obtained from $P(X_1,X_2)$ by integrating (summing) the values within each block. I would like to show that the following inequality holds
$$
H(X_1) + H(X_2) – H(X_1,X_2) \ge H'(B_1) + H'(B_2) – H'(B_1,B_2)
$$

where $H$ denotes entropy values computed with respect to $P(X_1,X_2)$ and $H'$ entropy values computed using $P'(X_1,X_2)$.

Question 1: Any suggestion on how to prove this?

Question 2: How would you call a matrix like the one above? According to wikipedia the name "block diagonal matrix" applies only if the matrix and the blocks are squares.

Best Answer

Consider the joint distribution of $B_1, X_1, X_2$ and note that it holds $$ \begin{align} P(B_1, X_1, X_2) &= P(B_1|X_1,X_2) P(X_1,X_2)\\ &=P(B_1|X_1) P(X_1|X_2) P(X_2), \end{align} $$ which means that $B_1, X_1, X2$ form a Markov chanin $X_2 \rightarrow X_1 \rightarrow B_1$. From the data processing inequality, it holds $$ \tag{1} I(X_2;B_1)\leq I(X_2;X_1). $$

Similarly, it can be shown that $B_1\rightarrow X_2 \rightarrow B_2$, which means $$ \tag{2} I(B_1;B_2)\leq I(B_1;X_2). $$

From (1) and (2), it follows that $$ I(B_1;B_2) \leq I(X_1,;X_2). $$

Related Question