Let $P(X_1,X_2)$ be a discrete bivariate distribution that has the form shown in the figure below, i.e. its support can be split into blocks that do not overlap on either dimensions.
Let's build $P'(B_1,B_2)$ obtained from $P(X_1,X_2)$ by integrating (summing) the values within each block. I would like to show that the following inequality holds
$$
H(X_1) + H(X_2) – H(X_1,X_2) \ge H'(B_1) + H'(B_2) – H'(B_1,B_2)
$$
where $H$ denotes entropy values computed with respect to $P(X_1,X_2)$ and $H'$ entropy values computed using $P'(X_1,X_2)$.
Question 1: Any suggestion on how to prove this?
Question 2: How would you call a matrix like the one above? According to wikipedia the name "block diagonal matrix" applies only if the matrix and the blocks are squares.
Best Answer
Consider the joint distribution of $B_1, X_1, X_2$ and note that it holds $$ \begin{align} P(B_1, X_1, X_2) &= P(B_1|X_1,X_2) P(X_1,X_2)\\ &=P(B_1|X_1) P(X_1|X_2) P(X_2), \end{align} $$ which means that $B_1, X_1, X2$ form a Markov chanin $X_2 \rightarrow X_1 \rightarrow B_1$. From the data processing inequality, it holds $$ \tag{1} I(X_2;B_1)\leq I(X_2;X_1). $$
Similarly, it can be shown that $B_1\rightarrow X_2 \rightarrow B_2$, which means $$ \tag{2} I(B_1;B_2)\leq I(B_1;X_2). $$
From (1) and (2), it follows that $$ I(B_1;B_2) \leq I(X_1,;X_2). $$