[Math] Bounding sum of multinomial coefficients by highest entropy one

co.combinatoricsit.information-theorypr.probabilityst.statistics

When does the following hold?

$$\sum_{(i_1,\ldots,i_k)\in E}
\frac{n!}{i_1! \ldots i_k!}
\le \exp(n H^*)$$

where $H^*=\max_{(i_1,\ldots,i_k)\in E} -(\frac{i_1}{n}\log \frac{i_1}{n}+\ldots +\frac{i_k}{n}\log \frac{i_k}{n})$ and E is some subset of {$ \{( i_1,\ldots,i_k):i_1+\ldots+i_k=n \}$}

Motivation: this is a generalization of Chernoff's bound to n tosses of fair k-sided dice where E represents the hypothesis we make about that sample. Another motivation is reconciling tight special-case Chernoff bound with looser but more general bound given by Sanov's theorem

Examples: when k=2, it can be proven to hold for sets of coefficients where first component of the coefficient is less than n/2 (ie here).

When k=3, it seems (empirically) to hold for sets of coefficients where sum of first two components is ≤n/2. For instance, for n=10, highest entropy term gives upper bound of (2/3)^3 *10^5 whereas exact sum is 12585. Since k=3 multinomial coefficients lie in a 2-simplex, the 21 multinomial coefficients in this set can be visualized below. Top vertex represents coefficient (0,0,10)

^(source)

For higher k, we can look at similar sets, ie corners of the (k-1) simplex. I tried few values and it seems to hold for coefficients where sum of first k-1 components is below n/(k-1)

Here's how you'd check it in Mathematica

getit[n_, k_, c_] := (
   all = Select[Tuples[Range[0, n], k], Total[#] == n &];
   e = Select[all, Total[Most[#]] <= c &];
   hterm[x_] := If[0 < x < 1, x Log[x], 0];
   H[event_] := -Total[hterm /@ (event/n)];
   exact = Total[Multinomial @@@ e];
   upper = Exp[n Max[H /@ e]];
   exact < upper
);
(* Check bound for k=3, n=10, with i1+i2<=5 *)
getit[10, 3, 5]

Update 8/18
Leandro gives a bound on a single multinomial coefficient which gives Sanov's theorem if we consider that there's at most $(n+1)^k$ multinomial coefficients in any set E. It seems that to generalize the proof of the tighter binomial bound to, say, trinomial coefficients, one would need to prove the following inequality first

$$p_1 \log q_1 + p_2 \log q_2 + p_3 \log q_3 \ge q_1 \log q_1 + q_2 \log q_2 + q_3 \log q_3$$

Where p and q add up to 1. For each q, the set of p's for which the above bound holds also gives us the hypothesis for which we can give tight Chernoff-like bound. Empirically, this bound seems to hold for p's "bounded away" from the uniform distribution. Black circle below represents q, blue region is the set of distributions p where the bound above holds. My Mathematica notebook

^(source)

Update 8/24: the bound holds for sets of coefficients of the form $i_1 a_1 + \ldots + i_n a_n \le C$ where $a_1\ldots a_n$ are arbitrary non-negative numbers, details in answer

Best Answer

Hi Yaroslav, With the additional hypothesis you are considering, I guess that the inequality can be proved following the text that you linked.

Fix any probability vector $(p_1,\ldots,p_k)$ and consider $E\subset \{1,\ldots, n\}^k$ such that $$ E\subset\left\{(i_1,\ldots,i_k); \sum_{j}^ki_j=n\ \text{and }\ p_j\geq\frac{i_j}{n} \right\} $$

For any element of $E$, we have

$$ \sum_{j}^k p_j\log p_j\leq \sum_{j=1}^k \frac{i_j}{n}\log p_j. $$ Therefore $$ -n\left(\sum_{j=1}^k-p_j\log p_j \right)\leq \sum_{j=1}^k i_j\log p_j. $$ Exponentiating both sides we get $$ \exp\left(-n\left(\sum_{j=1}^k-p_j\log p_j \right)\right)\leq p_1^{i_1}\ldots p_k^{i_k}. $$ For the other hand, we have that $$ 1=(p_1+\ldots+p_k)^n=\sum_{i_1,\ldots,i_k}\frac{n!}{i_1!\ldots i_k!}p_1^{i_1}\ldots p_k^{i_k}\geq \sum_{(i_1,\ldots,i_k)\in E}\frac{n!}{i_1!\ldots i_k!}\exp\left(-n\left(\sum_{j=1}^k-p_j\log p_j \right)\right), $$ where the first summation is taken over all sequences of nonnegative integer indices $i_1$ through $i_k$ such the sum of all $i_j$ is $n$.
So we get that $$ \sum_{(i_1,\ldots,i_k)\in E}\frac{n!}{i_1!\ldots i_k!}\leq\exp\left(n\left(\sum_{j=1}^k-p_j\log p_j \right)\right)\leq e^{n H^*}. $$

Best Answer

Related Solutions

[Math] Using Fisher Information to bound KL divergence

[Math] Why are multinomial coefficients with same entropy equal? (usually)

Related Question