[Math] Conditional entropy of continuous and discrete random variable vectors

entropymultivariable-calculusprobability

I have a continuous, multivariate normal distributed random variable vector $\mathbf{x} \sim \mathcal{N}(0, \mathbf{\Sigma} )$, $\mathbf{\Sigma} \in \mathbb{R}^{N \times N}$, and a discrete random variable vector $\mathbf{r} = [r_1, \ldots,r_N]^T$. Furthermore, $r_i$ is defined as $$r_i = \begin{cases} -l &,\text{for } x_i \in (-\infty,0)\\ \hphantom{-}l &, \text{otherwise} \end{cases} \quad, i \in \{1,\ldots,N\}. $$

My goal is to calculate the differential entropy $h(\mathbf{x}\vert\mathbf{r})$, where I'm actually not sure if it is still a differential entropy because $\mathbf{x}$ is conditioned on the discrete $\mathbf{r}$.

In (1) the conditional differential entropy is defined as
$$ h(X \vert Y ) = – \int f(x,y) \, \log ( f(x|y)) \, dx \,dy. $$

Reference (2) discusses similar problems. As a result, I defined my conditional differential entropy as

$$ h(\mathbf{x}\vert\mathbf{r}) = – \sum_{i \in \mathcal{R}} \int_{\mathbb{R}^N} p( \mathbf{r}_i \vert \mathbf{x}) \, f(\mathbf{x}) \, \log \left( \frac{ p( \mathbf{r}_i \vert \mathbf{x}) \, f(\mathbf{x})}{ p(\mathbf{r}_i) } \right) \, d\mathbf{x} $$

where $\mathcal{R}$ denotes the set of all possible realizations of $\mathbf{r}$, $p(\mathbf{r}_i)$ denotes their probability, $p( \mathbf{r}_i \vert \mathbf{x})$ denotes the conditional probability of $\mathbf{r}_i$ given $\mathbf{x}$ and $f( \mathbf{x} ) $ denotes the PDF of $\mathbf{x}$. Furthermore, I define $ p( \mathbf{r}_i)$ as

$$ p( \mathbf{r}_i \vert \mathbf{x}) = \begin{cases} 1 &, \mathbf{x} \in I_i\\ 0 & \text{otherwise} \end{cases}, $$

where $I_i$ denotes some appropriately defined $N$ dimensional interval. Using this it follows

$$ h(\mathbf{x}\vert\mathbf{r}) = – \sum_{i \in \mathcal{R}} \int_{I_i} f(\mathbf{x}) \, \log \left( \frac{ f(\mathbf{x})}{ p(\mathbf{r}_i) } \right) \, d\mathbf{x}. $$

Questions

  1. Are my definition of $ h(\mathbf{x}\vert\mathbf{r}) $ and $ p( \mathbf{r}_i \vert \mathbf{x}) $ correct?
  2. How can I solve this integral? It seems to be related to the entropy of the multivariate normal distribution. Can I somehow break it down into the $x_i$s? A numerical solution would also be fine for me. However, many-dimensional numerical integration seems to be challenging, too.

References

(1) Cover, Thomas M., and Joy A. Thomas. Elements of information theory. John Wiley & Sons, 2012. p.230 (9.32)

(2) Nair, Chandra, Balaji Prabhakar, and Devavrat Shah. "On entropy for mixtures of discrete and continuous variables." arXiv preprint cs/0607075 (2006). Available: http://chandra.ie.cuhk.edu.hk/pub/papers/manuscripts/ENT-arx07.pdf

Best Answer

By the chain rule of entropy, it holds

$$ \tag{1} h(\mathbf{x},\mathbf{r})=h(\mathbf{r})+h(\mathbf{x}\mid \mathbf{r}) $$

as well as

$$ \tag{2} h(\mathbf{x},\mathbf{r})=h(\mathbf{x})+h(\mathbf{r}\mid \mathbf{x}). $$

Now, since $\mathbf{r}$ is a deterministic function of $\mathbf{x}$, it holds

$$ h(\mathbf{r}\mid \mathbf{x})=0, $$

which, after substituting in (2) gives

$$ h(\mathbf{x},\mathbf{r}) = h(\mathbf{x}). $$

Substituting the last result in (1) leads to

$$ \begin{align} h(\mathbf{x}\mid \mathbf{r}) &= h(\mathbf{x})-h(\mathbf{r})\\ &=\frac{1}{2}\log\left((2\pi e)^N \det(\mathbf{\Sigma})\right)-\left(-\sum_\mathbf{r}p(\mathbf{r})\log(p(\mathbf{r}))\right). \end{align} $$

I am not aware of any closed-form formula for $p(\mathbf{r})$. According to this answer, it does not have one. Therefore, you would have in principle to enumerate all the possible ($2^N$) vectors $\mathbf{r}$ and numerically compute the pmf $p(\mathbf{r})$. My guess is that this procedure will become impractical for moderate $N$.

Remark: The above formula is the same as yours. Indeed,

$$ \begin{align} -\sum_{i\in \mathcal{R}} \int_{I_i} f(\mathbf{x})\log\frac{f(\mathbf{x})}{p(\mathbf{r}_i)}d\mathbf{x}&=-\sum_{i\in \mathcal{R}} \int_{I_i} f(\mathbf{x})\log f(\mathbf{x})d\mathbf{x}- \left(-\sum_{i\in \mathcal{R}} \log p(\mathbf{r}_i)\int_{I_i} f(\mathbf{x}) d\mathbf{x}\right)\\ &=-\int_{\mathbb{R}^N} f(\mathbf{x})\log f(\mathbf{x})d\mathbf{x}- \left(-\sum_{i\in \mathcal{R}} p(\mathbf{r}_i) \log p(\mathbf{r}_i)\right), \end{align} $$

since $\cup_{i\in\mathcal{R}}I_i=\mathbb{R}^N$, $I_i \cap I_j = \emptyset, \forall i\neq j$, and $\int_{I_i} f(\mathbf{x}) d\mathbf{x} = p(\mathbf{r}_i)$.

Related Question