[Math] Mutual information between two dependent variables

entropyinformation theoryprobability

Let $X$ be a discrete random variable such that
\begin{equation}
X = \begin{cases}
1 & \text{with p = 1/3} \\
-1 & \text{with p= 1/3} \\
0 & \text{with p = 1/3}
\end{cases}
\end{equation}
and let $Y = X^2$ whose distribution is
\begin{equation}
Y = \begin{cases}
1 & \text{with p = 2/3} \\
0 & \text{with p = 1/3}.
\end{cases}
\end{equation}

Note that it is clear that $Y$ is completely determined by $X$.
However, $X$ can not be completely determined by $Y$.

It follows from
$$ P(X=a,Y=b) = P(Y=b|X=a)P(X=a)$$
that we have the following joint distribution of $(X,Y)$,
$$P(X=1,Y=1) = P(X=-1,Y=1) = P(X=0,Y=0) = 1/3.$$
Note that $P(Y=b|X=a)$ is either 1 or 0, as $Y$ is completely determined by $X$.

Therefore, we compute the mutual information of $X$ and $Y$ and obtain
$$ I(X;Y) = \sum_{(x,y)} p(x,y)\log \left(\frac{p(x,y)}{p(x)p(y)}\right)
= \frac{1}{3}\log(3^3/4) > 0 $$
[Editted] Previously, wrong computation.

Based on the wikipedia

"Intuitively, mutual information measures the information that X and Y
share: It measures how much knowing one of these variables reduces
uncertainty about the other."

Question
In this example, knowing $X$ completely reduces uncertainty in $Y$.
Then how should I interpret this value $I(X;Y) = \frac{1}{3}\log(3^3/2) > 0$
in order to draw a conclusion that
"$Y$ can be completely determined by $X$"?

My thought
I think the mutual information of $X, Y=X^2$, doesn't make sense,
as knowing $X$ completely determines $Y$, however,
knowing $Y$ does not.
This is because $I(X;Y) = I(Y;X)$.
That is by changing the role of $X$ and $Y$, we obtain the same value even knowing the fact that $X$ completely determines $Y$ but $Y$ doesn't.
This doesn't sound right to me.

It would be very appreciated if someone gives some comments or suggestions or any thoughts.
Thanks in advance.

Best Answer

The intuition that "$Y$ can be completely determined by $X$" is verified only by the fact that $H(Y|X)=0$, i.e., "given $X$, there is no uncertainty in $Y$". The fact that $I(X,Y)>0$ is irrelevant. The value of $I(X;Y)$ quantifies the information that can be transfered when the "data" is $X$ and $Y$ is the "observation" or vice versa. In this case, since $Y$ is a well defined function of $X$, it is intuitive that the mutual information will be positive (i.e., observing $Y$ gives information about $X$ and vice versa). Actually, it will always hold $I(X;Y)>0$, unless $Y,X$ are independent random variables, resulting in $I(X;Y)=0$.

Related Question