[Math] Gini Coefficient and Renyi Entropy

entropyit.information-theorypr.probabilityst.statistics

Gini coefficient (aka Gini Index) is a quantity used in economics to describe income inequality. It is 0 for uniformly distributed income, and approaches 1 when all income is in hands of one individual.

For probabilities we have:

$$
G(\{p_i\}_{i\in\{1…n\}}) = \sum_{i=1}^n \frac{2i-n-1}{n}p_i,
$$

where $p_i$ is a non-decreasing sequence,
or in the continuous case:

$$
G(p) = \int_0^1 \left(1-2\int_0^xp(y)dy\right)dx = \int_0^1 (2x-1)p(x)dx
$$
where probability density function $p(x)$ is non-decreasing.

As it is a measure of uniformness of a probability distribution, I was curious if it is possible to relate Gini coefficient to some Rényi entropy $H_q$ (for example, collision entropy $H_2$)? (Or Tsallis entropy $T_q$ as for the continuous case formulas are shorter.)

I got an upper bound

$$
G(p) \leq \sqrt{\tfrac{1}{3}(e^{-H_2(p)}-1)} = \sqrt{-\tfrac{1}{3}T_2(p)}
$$
with Schwarz inequality for $(2x-1)$ and $(p(x)-1)$.

It works similarly for discrete probabilities:

$$
G(\{p_i\})\leq\sqrt{\tfrac{n}{3}}\sqrt{1-\tfrac{1}{n^2}}\sqrt{e^{-H_2(\{p_i\})}-\tfrac{1}{n}}.
$$

Is there any non-trivial lower bound? That is, except for

$$G(p)\geq 1 – e^{H_0(p)} = -T_0(p),$$
$$G(\{p_i\})\geq 1 – \tfrac{1}{n}e^{H_0(\{p_i\})}.$$

Is there an upper bound that is saturated for $p = \{0,0,\cdots,0,1\}$?

Best Answer

As of now, the best I got is:

$$ 1 - e^{H_{1/2}(p)} \leq G(p) \leq \sqrt{1-e^{2 H_{1/2}(p)}}, $$ where $e^{H_{1/2}(p)} = 2 \ln \int_0^1p^{1/2}(x)dx$.

Both inequalities are saturated for both $G(p)=0$ and $G(p)=1$.

The lower bound is better than the previous one ($1 - e^{H_{0}(p)}$), as Rényi entropy $H_q(p)$ is non-decreasing in $q$.

The higher bound is different from $\sqrt{\tfrac{1}{3}e^{-H_2(p)}-1}$ (there is no inequality between them).

And for practical purposes their quadratic mean seems to be a good approximation, $$ G(p) \approx \sqrt{1 - e^{H_{1/2}(p)}}. $$

To get some taste of it, here is a plot for the lower, the upper bound and the approximation:

enter image description here

(Sampled for distributions of the form $p(x) \propto (1-x+r)^{-\alpha} + c$, where $r$, $\alpha$ and $c$ are some parameters.)

Proof

We have: $$ G(p) = \int_0^1 (2x-1) p(x) dx = \tfrac{1}{2} \int_0^1 \int_0^1 |p(x) - p(y)|dx dy. $$ (Actually, the right hand side is more general, as it does not require any ordering of p(x).)

For the lower bound we use $$ |p(x) - p(y)| = \left|\sqrt{p(x)} + \sqrt{p(y)}\right| \left| \sqrt{p(x)} - \sqrt{p(y)} \right| \geq \left( \sqrt{p(x)} - \sqrt{p(y)} \right)^2 $$ and perform the integration.

For the upper bound we use Schwartz inequality for $\left|\sqrt{p(x)} + \sqrt{p(y)}\right|$ and $\left| \sqrt{p(x)} - \sqrt{p(y)} \right|$.

Related Question