I would be interested in computing the expected value of the following random variable $Y$.
Let $X$ be either a $\mathrm{Bin}(p,N)$ or a $\mathrm{Hyp}(n,m,N)$ with $m$ number of successes (I would be interested in both proofs). Is there a way to compute the expected value of
$$ Y = X \log X $$
I tried to go through the moment generating function but I haven't had any result yet.
Best Answer
In general, the expectation of $g(X)$ can often be approximated using a Taylor expansion around the mean; let $a=E(X)$
$$g(X) = g(a) + g'(a) (X-a) + \frac{1}{2!}g''(a) (X-a)^2 +\cdots$$
$$E[g(X)] = g(a) + \frac{1}{2!}g^{(2)}(a) \, m_2 + \frac{1}{3!} g^{(3)}(a) \, m_3 + \cdots $$
where $g^{(n)}(a)$ is the $n-$th derivative of $g(X)$ evaluated at the mean, and $m_k$ is the $k-$th centered moment of $X$.
In our case, $g(X)=X \log(X)$ and $g^{(n)}(a) = (-1)^n (n-2)! \; a^{-(n-1)}$ for $n>1$
So the expasion takes the form
$$ E[X \log X] \approx a \log a + \frac{1} {2 \times 1} \frac{m_2}{a} - \frac{1}{3 \times 2}\frac{m_3}{a^2} + \frac{1} {4 \times 3}\frac{m_4}{a^3} -\cdots$$
For the Binomial $(N,p)$, we get
$$ E[X \log X] \approx Np \log( Np) + \frac{1-p}{2} - \frac{(1-p)(1-2p)}{6 Np } + \cdots$$
And for the Hypergeometrix $(N, n, m)$
$$ E[X \log X] \approx a \log(a) + \frac{m}{2 (n+m)} - \cdots$$
where $a=E(X)=\frac{n N}{m+n}$ and i was too lazy to compute the next term.
It's seen that these are useful as asymptotic expansions, for $N \to \infty$.
For finite $N$, this should be not be used if $a \lesssim 1$.
Here are a few values, for the Binomial aproximation up to the third moment:
As cardinal points out in the comments, using the mean-value form for the error of the truncated Taylor expansion, if we truncate at an odd-moment term, (as I did for the Binomial above) we see that the error must be positive, and hence we have obtained a lower bound of the exact value. This can also be proven using Jensen's inequality, because $g(x)=x \log x$ is a convex function.