How is variation of information a true metric in the mathematical sense

information theorymeasure-theory

As of July 5 2019, Wikipedia states that "Unlike the mutual information, however, the variation of information is a true metric, in that it obeys the triangle inequality." (https://en.wikipedia.org/wiki/Variation_of_information). Since it's defined as $(X;Y)=H(X|Y)+H(Y|X)$, however, it seems that it would be zero when one event completely determines another. I was under the impression that it's possible for one event to completely determine another without them both being the exact same event, in which case the variation of information would in fact be a pseudometric.

Am I missing something, or am I just taking the phrase "true metric" too literally, and the sentence should be read as stating that variation of information is more metric-like than mutual information?

Best Answer

If one event determines another, the quantity $𝐻(𝑋|π‘Œ)+𝐻(π‘Œ|𝑋)$ is not necessarily $0$. The value will only be $0$ if both $X$ and $Y$ determine each other.

For instance if $Y$ is a function of $X$ (i.e. $Y = f(X)$) then $H(Y|X) = 0$ but $H(X | Y)$ can take a value greater than $0$. So just because $X$ determines $Y$ does not mean $Y$ determines $X$.

For instance if $X$ ~ $\mathcal{U}(-N , N)$, $f(x) = |x|$ and $Y = f(X)$ then it's clear that $X$ determines $Y$ but because $Y$ does not determine $X$ the quantity $𝐻(𝑋|π‘Œ)+𝐻(π‘Œ|𝑋)$ will be non-zero.

\begin{align} H(X) &= \log_2 N + 1 \\ H(Y) &= \log_2 N \\ H(Y|X) &= 0 \\ H(X|Y) &= 1 \\ \end{align}

Related Question