Relation between cross entropy and conditional entropy

coding-theoryconditional probabilityentropyinformation theoryoptimization

Is there a relationship between cross-entropy and conditional entropy between two categorical variables?

Definition of cross-entropy:
$$
H_X(Y) = -\sum_{x} P(X=x)\log P(Y=x)
$$

Definition of conditional entropy:
$$ \small
H(Y|X) = -\sum_{(x,y)} P(X=x,Y=y)\log P(Y=y|X=x)
$$

Here, $X$ and $Y$ are defined over the same finite probability space — i.e., the possibilities for $x$ and $y$ are a finite shared set $\{1,2,3,…,n\}$.

In an optimization problem, can we minimize cross-entropy instead of minimizing conditional entropy? If so, can we derive the relationship between these two?

Best Answer

There is little or no relationship. The cross entropy relates only to the marginal distributions, (the dependence between $X$ and $Y$ do not matter) while the conditional entropy relates to the joint distribution (dependence between $X$ and $Y$ is essential).

In general you could write

$$\begin{align} H_X(Y) &= H(X) + D_{KL}(p_X ||p_Y) \\ &= H(X|Y) +I(X;Y) + D_{KL}(p_X ||p_Y) \\ &= H(X|Y) +D_{KL}(p_{X,Y} || p_X p_Y) + D_{KL}(p_X ||p_Y) \end{align}$$

but I doubt that this could be useful or have a nice interpretation.

You can readily conclude that $$H_X(Y)\ge H(X|Y)$$

with $H_X(Y) = H(X|Y) \iff$ $X,Y$ are iid.

Best Answer

Related Solutions

[Math] Relation between cross entropy and joint entropy

Minimizing cross entropy between reference and target distributions over a restricted domain.

Related Question