[Math] Joint entropy calculation of discrete random variables

entropy

Suppose that i want to calculate the joint entropy $H(A,B)$ of two discrete random variables of the form:

$A=\{-1,1,1,-1,-1,-1,1,1\}$ and $B=\{1,-1,1,1,-1,-1,-1,1\}$.

If the goal was just the calculation of the entropy of A or B, then, for example, i would have:
$H(A)=- \sum{p*\log_2 (p)}$ where the probability mass function $p$ would be calculated from the observed frequencies of $-1$ and $1$. This means that
$H(A)=- [\frac{1}{2} \log_2(\frac{1}{2}) + \frac{1}{2} \log_2(\frac{1}{2})]$. But what about the joint entropy and what should i do if i had more than two dicrete random variables (of the same form, with elements $-1$ and $1$)?

Best Answer

The possible values of the triplet $A,B,C$ are $\{(1,1,1),(1,1,-1),\cdots,(-1,-1,-1)\}.$ Based on a sample the probabilities of the the different $8$ outcomes could be estimated. Let those probabilities be denoted by $p_{1,1,1},p_{1,1,-1},\cdots,p_{-1,-1,-1}$.

The entropy of $(A,B,C)$, by definition, is

$$H(A,B,C)=-(p_{1,1,1}\log (p_{1,1,1})+p_{1,1,-1}\log(p_{1,1,-1})...+p_{-1,-1,-1}\log (p_{-1,-1,-1}).$$

Or, in general, if $\{p_1,p_2,\cdots p_n\}$ is the pmf of a discrete random variable then the corresponding entropy is

$$H=-\sum_{i=1}^np_i\log(p_i).$$ (The base ofe $\log$ is considered to be $2$ in this contexts.)

Edited

Example for $A,B$:

For instance the estimate for $p_{1,1}=\frac{2}{8}=\frac{1}{4},$ because in the given sample of $8$ elements the number of occurrences of $1,1$ is $2$. Also, $p_{1,-1}=\frac{1}{4}$,$p_{-1,1}=\frac{1}{4}$, $p_{-1,-1}=\frac{1}{4}$. So

$$H(A,B)=-\log\left(\frac{1}{4}\right)=2.$$ But this is only a very poor estimate!! The sample is small.