Estimating the Shanon entropy of a high-dimensional discrete multivariate random variable by dead simple monte carlo sampling

entropyinformation theorymonte carlo

I'm interested in estimating Shannon's entropy for a discrete multivariate random variable $X$ that has high dimensionality (i.e. $X=(X1,..,Xn)$, where n is at the hundreds).

I can effectively sample from $X$, as well as evaluate $P(X=x)$. But due to the high dimensionality, I cannot enumerate all possible values to calculate the entropy.

A simple, straightforward Monte Carlo estimator seems to be:

Sample a large number ($m$) of observations $\{o_1,..,o_m\}$ from $X$ and calculate the sample average of their negative log-probability:
$\hat{H}(X)=-\frac{1}{M}\sum_i{\log{p(o_i)}}$.

This approximates $H(X)=-\sum_{x\in X}{p(x)\log{p(x)}}$ which requires summation over all possible values of $X$.

  1. Is this a correct approach? Am I missing something?
  2. Is this estimator biased? and if it is, why?
  3. If this is an unbiased estimator, then what is the issue that is addressed by entropy estimation methods such as those mentioned in this question? Does the bias arise in the setting where we can't evaluate $\log{p(x)}$ and must resort to noisy occurrence counts or histograms?

Best Answer

In general, for any rv $X$ and any function $Y=g(X)$ such that $E[Y^2]$ is finite, we have that the sample average

$$ \frac{1}{M}\sum_M g(x_i) $$ is an unbiased consistent estimator of $E[g(X)]$ (hence it converges in probability, cf WLLN)

Then, if you know the probability distribution $p$, then of course the estimator is unbiased (and consistent).

The problem is when we don't know $p$ and we must estimate it from the samples. As the accepted answered of the linked question points in the flowchart, when one knows $p$ the problem does not arise.

Related Question