Probability – How to Determine if Increasing Variance Increases the Value of a Function

distributionsentropyoptimizationprobabilityrandom variable

Let $$V=\sum_{i=1}^{k} a_iX_i$$ where $$X_i's$$ are IID $$\sim Bern(q)$$ and $$V$$ with $$\sum a_i=k$$. Note that $$a_i$$'s are non-negative integers.

I have a function $$f$$ as given below :
$$f= \max_{q} h\left(E\left[e^{-\frac{1}{V +1}}\right]\right)-E\left[h\left(e^{-\frac{1}{V +1}}\right)\right]$$

where $$E(\cdot)$$ denotes expectation w.r.t $$V$$ and $$h(p)=-p \log _{2} p-(1-p) \log _{2}(1-p)$$ is the binary entropy function.

My simulations indicated that the value of $$f$$ increases as the number of non-zero entries in $$[a_1,a_2,\ldots a_k]$$ decreases. Note that $$q$$ can be chosen to be different for different $$\vec{a}=[a_1,\ldots a_k]$$.

I am trying to come up with formal proof for this (as well as an intuitive explanation if possible). Can anyone help me formally establish this?

Example:

Let $$k$$=8. Consider the following $$\vec{a}$$ values:

1. $$\vec{a}=[1, 1, 1, 1, 1, 1, 1, 1]$$
2. $$\vec{a}=[2, 2, 2, 2, 0, 0, 0, 0]$$
3. $$\vec{a}=[4, 4, 0, 0, 0, 0, 0, 0]$$
4. $$\vec{a}=[8, 0, 0, 0, 0, 0, 0, 0]$$

I noticed that case 4 leads to maximum value for $$f$$ while case 1 has the lowest $$f$$.
The below figure illustrates these simulation results. For instance, Case 4 above has Hamming weight 1 and $$f \approx 0.24.$$ Case 1 has Hamming weight 8 and $$f \approx 0.07$$.

Your function $$f$$ has the form

$$h(\mathbb E(Y)) - \mathbb E (h(Y))$$

which (for any function $$h$$) becomes zero when $$\text{var}(Y) \to 0$$ (that is, when the distribution of $$Y$$ becomes a $$\delta$$-function). So, intuitively it should be clear why $$f$$ increases when the variance of $$Y$$ increases.

To see this a bit more formally, you could look at the Taylor expansion of $$h$$ around $$\bar Y=\mathbb E(Y)$$ (see e.g. here)

$$\mathbb E (h(Y)) = h(\bar Y) + \frac{1}{2} h''( \bar Y)\sigma_Y^2 + \frac{1}{3!}h^{(3)}(\bar Y)\mathbb E [(Y-\bar Y)^3] \;+ \; ...$$

(Note that since $$Y = e^{-\frac{1}{V+1}}$$ is bounded between $$e^{-1}$$ and $$e^{-\frac{1}{k+1}} < 1$$ then the higher moments are exponentially decreasing in magnitude)

The leading order term of the difference is proportional to the variance of $$Y$$. Specifically for your $$h(p)$$ with $$h''(p) = -\frac{1}{p(1-p)}$$,

$$f \approx \max_q \frac{\sigma_Y^2}{2\bar Y (1 - \bar Y) } > 0$$

Since $$Y$$ is a function of $$V$$ it's variance is roughly proportional to that of $$V$$ which is $$\text{var}(V) = \sum_i a_i^2 \text{var}(X_i) = q(1-q)\sum_i a_i^2$$, and it is easy to see that for fixed $$k$$ the variance of $$V$$ is minimal when all the $$a_i$$'s are equal and is maximal when only one $$a_i$$ is non zero.

Intuitively therefore, when you keep $$k$$ fixed and change the number of non zero elements of $$a$$ you are increasing $$\sigma_Y^2$$ while keeping $$\bar Y$$ roughly fixed, so the overall function value increases. And Since this is true for any fixed $$q$$ it should remain true after maximizing w.r.t $$q$$. (That is, if for some $$\vec a_1$$ and $$\vec a_2$$, $$f(\vec a_1,q) \ge f(\vec a_2,q)$$ holds for all $$q$$, then $$\max_q f(\vec a_1,q) \ge \max_q f(\vec a_2,q)$$)

Of course this is a very rough approximation and the details will have to be worked out much more carefully in order to make any sort of a definite statement. Specifically note that the above expression diverges when $$\bar Y \to 1$$ which can happen when $$k \to \infty$$, so none of this may be true for large enough $$k$$.