[Math] How big is the sum of smallest multinomial coefficients

binomial-coefficientsco.combinatoricspr.probability

Given positive integers $n$ and $d$, let $S$ indicate the list of all $d$-tuples of non-negative integers $(c_1,\ldots,c_d)$ such that $c_1+\cdots+c_d=n$. Let $v_i$ be the value of the multinomial coefficient corresponding to $i$'th tuple in $S$, ie

$$v_i=\frac{n!}{c_1!\cdots c_d!}$$

What can we say about the sum of smallest coefficients, ie, the value of the following?

$$s(B)=\sum_{v_i < B} v_i$$

Motivation: upper bounds on multinomial tails would allow to give non-asymptotic error bounds for various learning algorithms

Update: 09/03
Here are all the relevant theoretical results I found so far. Let $B=\frac{n!}{c_1!\cdots c_d}$, $C=\max_i v_i$, $k=\min_i c_i$. Then for even n and $d=2$ the following are known to hold

$$s(B)<\frac{B}{C} 2^n$$
Proof under Lemma 3.8.2 of Lovasz et al "Discrete Mathematics" (2003)

$$s(B)\le 2^n \exp(-\frac{(n/d-k)^2}{n-k})$$
Proof under Theorem 5.3.2 of Lovasz et al "Discrete Mathematics" (2003)

$$s(B)\le 2B(\frac{n-(k-1)}{n-(2k-1)}-1)$$
Michael Lugo gives outline of proof in another MO post

$$s(B)<2(\exp(n \log n – \sum_i c_i \log c_i)-B)$$
Proof under Lemma 16.19 of Flum et al Parameterized Complexity Theory (2006)

To be practically useful for my application, these bounds need to be tight for tails, ie, for sums that are less than $d^n/10$. Here's a plot of logarithm of bound/exact ratio for such sums. X-axis is monotonically related to B.

(source)

You can see that Michael Lugo's bound is by far the most accurate in that range.

Out of curiosity, I "plugged in" bounds above for sums of higher dimensional coefficients.


(source)


(source)


Mathematica notebook.

Best Answer

If I'm not mistaken, you at least have the following bound:

$$ s(B) \ \; \leq\ \; d^n \,-\, \exp \left( -2 \left( \frac{\ln B}{\ln ( n! ) - d\ln ( (n/d)! )} \right)^2 \right) \,d^{n} . $$

You may want to do a little fiddling around with Stirling's approximation to reduce the denominator in the fraction a bit further, but I didn't want to weaken the bound more than necessary.

I found this by upper-bounding the probability that a multinomial distribution with uniform bin probabilities yields an "atypical" observation, using Hoeffding's inequality. "Atypical" is here defined in the information-theoretic sense: An atypical observation is one whose logarithmic probability falls short of the expected logarithmic probability over the whole distribution (see Cover and Thomas' Elements of Information Theory, particularly chapter 11).

Some more details:

Let $p$ be point probabilities for a multinomial distribution with uniform bin probabilities:

$$ p(c) \ =\ \left( \frac{n!}{c_1!c_2!\cdots c_d!} \right) d^{-n}. $$

Notice that

$$ \ln p(c) \,+\, n\ln d \,\ =\ \ln \frac{n!}{c_1!c_2!\cdots c_d!}, $$

and thus

$$ \ln p(c) \,+\, n\ln d \;\leq\; \ln B \;\quad \iff\quad \frac{n!}{c_1!c_2!\cdots c_d!} \;\leq\; B. $$

Further, the entropy of the flat multinomial distribution is less than the entropy of $n$ samples from the flat categorical distribution with $d$ bins: The categorical includes order information as well as frequency information, while the multinomial only includes frequency information.

We thus have the following bound on the expected value of $-\ln p(c)$:

$$ E \left[\, -\ln p(c)\, \right] \ \leq\ n \cdot H\left( \frac{1}{d}, \frac{1}{d}, \ldots, \frac{1}{d} \right) \ =\ n\ln d, $$

or in other words, $-n\ln d \leq E[\ln p(c)]$.

Further, the minimum and maximum values of the random variable $\ln p(c)$ are

$$ a \; =\; \min_c \ln p(c) \; =\; \ln \frac{n!}{n!\, 0!\, 0!\, \cdots\, 0!} d^{-n} \; =\; -n\ln d; $$

$$ b \; =\; \max_c \ln p(c) \; =\; \ln \frac{n!}{(n/d)!\, (n/d)!\, \cdots\, (n/d)!} d^{-n} \; =\; \ln ( n! ) - d\ln ( (n/d)! ) - n\ln d. $$

The squared distance between these two extremes is consequently

$$ (a - b)^2 \ =\ \left( \ln ( n! ) - d\ln ( (n/d)! ) \right)^2. $$

We can now make the following comparisons:

\begin{eqnarray} \Pr\left\{ \frac{n!}{c_1!c_2!\cdots c_d!} < B \right\} & = & \Pr\left\{ \ln p(c) \,+\, n\ln d < \ln B \right\} \\ & \leq & \Pr\left\{ \ln p(c) \,-\, E[\ln p(c)] < \ln B \right\} \\ & = & 1 \,-\, \Pr\left\{ \ln p(c) \,-\, E[\ln p(c)] \geq \ln B \right\} \\ & \leq & 1 \,-\, \exp \left( -2 \left( \frac{\ln B}{\ln ( n! ) - d\ln ( (n/d)! )} \right)^2 \right). \end{eqnarray}

The first inequality follows from the fact that a proposition always has a lower probability than its logical consequences; the second is the application of Heoffding's inequality.

We thus have

$$ \Pr\left\{ \frac{n!}{c_1!c_2!\cdots c_d!} < B \right\} \ \leq \ 1\;-\;\exp \left( -2 \left( \frac{\ln B}{\ln ( n! ) - d\ln ( (n/d)! )} \right)^2 \right). $$

By multiplying by $d^n$, we obtain the inequality stated above, since the probability of a sample from a flat multinomial distribution is equal to the corresponding multinomial coefficient divided by $d^n$.