Solved – Expected value of a transformed random variable

expected valuemathematical-statisticsmoment-generating-functionrandom variable

I would be interested in computing the expected value of the following random variable $Y$.

Let $X$ be either a $\mathrm{Bin}(p,N)$ or a $\mathrm{Hyp}(n,m,N)$ with $m$ number of successes (I would be interested in both proofs). Is there a way to compute the expected value of
$$ Y = X \log X $$
I tried to go through the moment generating function but I haven't had any result yet.

Best Answer

In general, the expectation of $g(X)$ can often be approximated using a Taylor expansion around the mean; let $a=E(X)$

$$g(X) = g(a) + g'(a) (X-a) + \frac{1}{2!}g''(a) (X-a)^2 +\cdots$$

$$E[g(X)] = g(a) + \frac{1}{2!}g^{(2)}(a) \, m_2 + \frac{1}{3!} g^{(3)}(a) \, m_3 + \cdots $$

where $g^{(n)}(a)$ is the $n-$th derivative of $g(X)$ evaluated at the mean, and $m_k$ is the $k-$th centered moment of $X$.

In our case, $g(X)=X \log(X)$ and $g^{(n)}(a) = (-1)^n (n-2)! \; a^{-(n-1)}$ for $n>1$

So the expasion takes the form

$$ E[X \log X] \approx a \log a + \frac{1} {2 \times 1} \frac{m_2}{a} - \frac{1}{3 \times 2}\frac{m_3}{a^2} + \frac{1} {4 \times 3}\frac{m_4}{a^3} -\cdots$$

For the Binomial $(N,p)$, we get

$$ E[X \log X] \approx Np \log( Np) + \frac{1-p}{2} - \frac{(1-p)(1-2p)}{6 Np } + \cdots$$

And for the Hypergeometrix $(N, n, m)$

$$ E[X \log X] \approx a \log(a) + \frac{m}{2 (n+m)} - \cdots$$

where $a=E(X)=\frac{n N}{m+n}$ and i was too lazy to compute the next term.

It's seen that these are useful as asymptotic expansions, for $N \to \infty$.

For finite $N$, this should be not be used if $a \lesssim 1$.

Here are a few values, for the Binomial aproximation up to the third moment:

           p=0.2               p=0.5         p=0.8
        exact   approx     exact  approx  exact  approx
N 5     0.4907  0.3200     2.5811 2.5407  5.6542 5.6502
N 10    1.8545  1.7463     8.3123 8.2972  16.740 16.738
N 20    5.9740  5.9252     23.283 23.276  44.463 44.463

As cardinal points out in the comments, using the mean-value form for the error of the truncated Taylor expansion, if we truncate at an odd-moment term, (as I did for the Binomial above) we see that the error must be positive, and hence we have obtained a lower bound of the exact value. This can also be proven using Jensen's inequality, because $g(x)=x \log x$ is a convex function.