Probability – How to Determine if Increasing Variance Increases the Value of a Function

distributionsentropyoptimizationprobabilityrandom variable

Let $V=\sum_{i=1}^{k} a_iX_i$ where $X_i's$ are IID $\sim Bern(q)$ and $V$ with $\sum a_i=k$. Note that $a_i$'s are non-negative integers.

I have a function $f$ as given below :
$$
f= \max_{q} h\left(E\left[e^{-\frac{1}{V +1}}\right]\right)-E\left[h\left(e^{-\frac{1}{V +1}}\right)\right]
$$

where $E(\cdot)$ denotes expectation w.r.t $V$ and $h(p)=-p \log _{2} p-(1-p) \log _{2}(1-p)$ is the binary entropy function.

My simulations indicated that the value of $f$ increases as the number of non-zero entries in $[a_1,a_2,\ldots a_k]$ decreases. Note that $q$ can be chosen to be different for different $\vec{a}=[a_1,\ldots a_k]$.

I am trying to come up with formal proof for this (as well as an intuitive explanation if possible). Can anyone help me formally establish this?

Example:

Let $k$=8. Consider the following $\vec{a}$ values:

  1. $\vec{a}=[1, 1, 1, 1, 1, 1, 1, 1]$
  2. $\vec{a}=[2, 2, 2, 2, 0, 0, 0, 0]$
  3. $\vec{a}=[4, 4, 0, 0, 0, 0, 0, 0]$
  4. $\vec{a}=[8, 0, 0, 0, 0, 0, 0, 0]$

I noticed that case 4 leads to maximum value for $f$ while case 1 has the lowest $f$.
The below figure illustrates these simulation results. For instance, Case 4 above has Hamming weight 1 and $f \approx 0.24.$ Case 1 has Hamming weight 8 and $f \approx 0.07$.

Simulation result

Best Answer

Your function $f$ has the form

$$ h(\mathbb E(Y)) - \mathbb E (h(Y)) $$

which (for any function $h$) becomes zero when $\text{var}(Y) \to 0$ (that is, when the distribution of $Y$ becomes a $\delta$-function). So, intuitively it should be clear why $f$ increases when the variance of $Y$ increases.

To see this a bit more formally, you could look at the Taylor expansion of $h$ around $\bar Y=\mathbb E(Y)$ (see e.g. here)

$$ \mathbb E (h(Y)) = h(\bar Y) + \frac{1}{2} h''( \bar Y)\sigma_Y^2 + \frac{1}{3!}h^{(3)}(\bar Y)\mathbb E [(Y-\bar Y)^3] \;+ \; ... $$

(Note that since $Y = e^{-\frac{1}{V+1}}$ is bounded between $e^{-1}$ and $e^{-\frac{1}{k+1}} < 1$ then the higher moments are exponentially decreasing in magnitude)

The leading order term of the difference is proportional to the variance of $Y$. Specifically for your $h(p)$ with $h''(p) = -\frac{1}{p(1-p)}$,

$$ f \approx \max_q \frac{\sigma_Y^2}{2\bar Y (1 - \bar Y) } > 0$$

Since $Y$ is a function of $V$ it's variance is roughly proportional to that of $V$ which is $\text{var}(V) = \sum_i a_i^2 \text{var}(X_i) = q(1-q)\sum_i a_i^2$, and it is easy to see that for fixed $k$ the variance of $V$ is minimal when all the $a_i$'s are equal and is maximal when only one $a_i$ is non zero.

Intuitively therefore, when you keep $k$ fixed and change the number of non zero elements of $a$ you are increasing $\sigma_Y^2$ while keeping $\bar Y$ roughly fixed, so the overall function value increases. And Since this is true for any fixed $q$ it should remain true after maximizing w.r.t $q$. (That is, if for some $\vec a_1$ and $\vec a_2$, $f(\vec a_1,q) \ge f(\vec a_2,q)$ holds for all $q$, then $\max_q f(\vec a_1,q) \ge \max_q f(\vec a_2,q)$)

Of course this is a very rough approximation and the details will have to be worked out much more carefully in order to make any sort of a definite statement. Specifically note that the above expression diverges when $\bar Y \to 1$ which can happen when $k \to \infty$, so none of this may be true for large enough $k$.

Related Question