Hi Yaroslav, With the additional hypothesis you are considering, I guess that the inequality can be proved following the text that you linked.
Fix any probability vector $(p_1,\ldots,p_k)$ and consider $E\subset \{1,\ldots, n\}^k$ such that
$$
E\subset\left\{(i_1,\ldots,i_k); \sum_{j}^ki_j=n\ \text{and }\ p_j\geq\frac{i_j}{n} \right\}
$$
For any element of $E$, we have
$$
\sum_{j}^k p_j\log p_j\leq \sum_{j=1}^k \frac{i_j}{n}\log p_j.
$$
Therefore
$$
-n\left(\sum_{j=1}^k-p_j\log p_j \right)\leq \sum_{j=1}^k i_j\log p_j.
$$
Exponentiating both sides we get
$$
\exp\left(-n\left(\sum_{j=1}^k-p_j\log p_j \right)\right)\leq p_1^{i_1}\ldots p_k^{i_k}.
$$
For the other hand, we have that
$$
1=(p_1+\ldots+p_k)^n=\sum_{i_1,\ldots,i_k}\frac{n!}{i_1!\ldots i_k!}p_1^{i_1}\ldots p_k^{i_k}\geq \sum_{(i_1,\ldots,i_k)\in E}\frac{n!}{i_1!\ldots i_k!}\exp\left(-n\left(\sum_{j=1}^k-p_j\log p_j \right)\right),
$$
where the first summation is taken over all sequences of nonnegative integer indices $i_1$ through $i_k$ such the sum of all $i_j$ is $n$.
So we get that
$$
\sum_{(i_1,\ldots,i_k)\in E}\frac{n!}{i_1!\ldots i_k!}\leq\exp\left(n\left(\sum_{j=1}^k-p_j\log p_j \right)\right)\leq e^{n H^*}.
$$
If I'm not mistaken, you at least have the following bound:
$$
s(B) \ \; \leq\ \; d^n \,-\, \exp \left( -2 \left( \frac{\ln B}{\ln ( n! ) - d\ln ( (n/d)! )} \right)^2 \right) \,d^{n} .
$$
You may want to do a little fiddling around with Stirling's approximation to reduce the denominator in the fraction a bit further, but I didn't want to weaken the bound more than necessary.
I found this by upper-bounding the probability that a multinomial distribution with uniform bin probabilities yields an "atypical" observation, using Hoeffding's inequality. "Atypical" is here defined in the information-theoretic sense: An atypical observation is one whose logarithmic probability falls short of the expected logarithmic probability over the whole distribution (see Cover and Thomas' Elements of Information Theory, particularly chapter 11).
Some more details:
Let $p$ be point probabilities for a multinomial distribution with uniform bin probabilities:
$$
p(c) \ =\ \left( \frac{n!}{c_1!c_2!\cdots c_d!} \right) d^{-n}.
$$
Notice that
$$
\ln p(c) \,+\, n\ln d \,\ =\ \ln \frac{n!}{c_1!c_2!\cdots c_d!},
$$
and thus
$$
\ln p(c) \,+\, n\ln d \;\leq\; \ln B \;\quad \iff\quad \frac{n!}{c_1!c_2!\cdots c_d!} \;\leq\; B.
$$
Further, the entropy of the flat multinomial distribution is less than the entropy of $n$ samples from the flat categorical distribution with $d$ bins: The categorical includes order information as well as frequency information, while the multinomial only includes frequency information.
We thus have the following bound on the expected value of $-\ln p(c)$:
$$
E \left[\, -\ln p(c)\, \right] \ \leq\
n \cdot H\left( \frac{1}{d}, \frac{1}{d}, \ldots, \frac{1}{d} \right)
\ =\ n\ln d,
$$
or in other words, $-n\ln d \leq E[\ln p(c)]$.
Further, the minimum and maximum values of the random variable $\ln p(c)$ are
$$
a
\; =\;
\min_c \ln p(c)
\; =\;
\ln \frac{n!}{n!\, 0!\, 0!\, \cdots\, 0!} d^{-n}
\; =\;
-n\ln d;
$$
$$
b
\; =\;
\max_c \ln p(c)
\; =\;
\ln \frac{n!}{(n/d)!\, (n/d)!\, \cdots\, (n/d)!} d^{-n}
\; =\;
\ln ( n! ) - d\ln ( (n/d)! ) - n\ln d.
$$
The squared distance between these two extremes is consequently
$$
(a - b)^2 \ =\ \left( \ln ( n! ) - d\ln ( (n/d)! ) \right)^2.
$$
We can now make the following comparisons:
\begin{eqnarray}
\Pr\left\{ \frac{n!}{c_1!c_2!\cdots c_d!} < B \right\}
& = &
\Pr\left\{ \ln p(c) \,+\, n\ln d < \ln B \right\}
\\
& \leq &
\Pr\left\{ \ln p(c) \,-\, E[\ln p(c)] < \ln B \right\}
\\
& = &
1 \,-\, \Pr\left\{ \ln p(c) \,-\, E[\ln p(c)] \geq \ln B \right\}
\\
& \leq &
1 \,-\, \exp \left( -2 \left( \frac{\ln B}{\ln ( n! ) - d\ln ( (n/d)! )} \right)^2 \right).
\end{eqnarray}
The first inequality follows from the fact that a proposition always has a lower probability than its logical consequences; the second is the application of Heoffding's inequality.
We thus have
$$
\Pr\left\{ \frac{n!}{c_1!c_2!\cdots c_d!} < B \right\}
\ \leq \
1\;-\;\exp \left( -2 \left( \frac{\ln B}{\ln ( n! ) - d\ln ( (n/d)! )} \right)^2 \right).
$$
By multiplying by $d^n$, we obtain the inequality stated above, since the probability of a sample from a flat multinomial distribution is equal to the corresponding multinomial coefficient divided by $d^n$.
Best Answer
I add two hopefully useful remarks.
I consider the general situation. In the sequel $k\geq 1$ and $\lambda=(\lambda_1,\ldots,\lambda_{k+1})$ are fixed, $s:=\sum_{i=1}^{k+1}\lambda_i$ and $n\geq s$.
Let $A_n^{(k+1)}(\lambda):=\sum\limits_{\stackrel{a_i\geq \lambda_i, i=1,\ldots,k+1}{a_1+\ldots +a_{k+1}=n}} \prod_{i=1}^{k+1} a_i!$ and let $\mathcal{S}_{k}:=\{ (x_1,\ldots,x_{k+1})\;:\;x_i\geq 0, \sum_{i=1}^{k+1} x_i=1\}$ denote the $k$-dimensional standard simplex.
(i) $A_n^{(k+1)}(\mathbf{\lambda})$ has a nice geometric representation:
$$ A_n^{(k+1)}(\lambda)=\frac{(n-s+k)!\,(n+k)!}{(n-s)!} \int_{\mathcal{S}_{k}} \int_{\mathcal{S}_{k}} \left(x_1y_1+\ldots+x_{k+1}y_{k+1}\right)^{n-s}\,\prod_{i=1}^{k+1} x_i^{\lambda_i}d\mathbf{x}\,\,d\mathbf{y} $$
Proof: recall Dirichlet's integral: $$((\sum_{i=1}^{k+1}a_i)+k)!\, \int_{\mathcal{S}_{k}} \prod_{i=1}^{k+1} x_i^{a_i}\,d\mathbf{x}=\prod_{i=1}^{k+1} a_i!$$ Now expand the integrand using the multinomial theorem and use Dirichlet's integral repeatedly.
EDIT: I have replaced the previous (unnecessarily complicated) derivation and again corrected the factor (apologies). (The conclusions remain unchanged).
(ii) Thus the large $n$ asymptotic of $A_n^{(k+1)}(\lambda)$ can be investigated along the lines of the Laplace method (explained e.g. in chapter 4 of de Bruijn's book).
Remark: I have done that, and (unfortunately?) the result confirmed your original conjecture. Thus either my analysis or the accepted answer (or both) are incorrect (no offence).
EDIT:
The Laplace analysis follows the usual scheme: (1) identify maxima (2) cutoff tails, approximate integrand (3) complete tails
I sketch the main steps. Let $I$ denote the integral above.
Basic intuition:
(1) Consider the scalar product $\langle \mathbf{x},\mathbf{y}\rangle>=\sum_{i=1}^{k+1}x_iy_i\leq 1$. On the domain of integration $\frac{1}{k+1}\leq \langle\mathbf{x},\mathbf{y}\rangle\leq 1$, and $\langle\mathbf{x},\mathbf{y}\rangle$ can be 1 only in a "corner" $x_iy_i=1$ for a certain $i$.
(2) For any $0<\epsilon<1$ the part of $I$ where $\langle\mathbf{x},\mathbf{y}\rangle<1-\epsilon$ is asymptotically negligible (compared to its complement). Asymptotically thus only the parts around the corners (i.e. $x_iy_i\approx 1$) need to be considered.
With this in mind: (3) choose $\epsilon$ so small that the corners $C_i:=\{ x\in \mathcal{S}_k\;:\; x_i\geq 1-\epsilon, y_i\geq 1-\epsilon\}$ are disjoint, and discuss their influence individually.
E.g. for corner $k+1$ choose $x_1,\ldots,x_k$ resp. $y_1,\ldots,y_k$ as independent coordinates. Then $x_{k+1}=1-\sum_{i=1}^k x_i=:1-u$, $y_{k+1}=1-\sum_{i=1}^k y_i=:1-t$.
We have $\langle\mathbf{x},\mathbf{y}\rangle=(1-u)(1-t)+\sum_{i=1}x_iy_i\leq (1-u)(1-t)+ut$, thus $(\langle\mathbf{x},\mathbf{y}\rangle)^2\leq (1-2u(1-u))(1-2t(1-t))$. The parts of the integral where $u\geq 1/n^{2/3}$ or $t\geq 1/n^{2/3}$ are negligible because $\langle\mathbf{x},\mathbf{y}\rangle^n=\mathcal{O}(\exp(-n^{1/3}))$ there. On the remaining part $\langle\mathbf{x},\mathbf{y}\rangle^n =\exp(-n(u+t))(1+o(1))$ uniformly. Therefore introduce new coordinates $x_1,\ldots,x_{k-1},u$, $y_1,\ldots,y_{k-1},t$, then \begin{align*} &\int_{{C}_{k+1}}\int \left(x_1y_1+\ldots+x_{k+1}y_{k+1}\right)^n\,\prod_{i=1}^{k+1} x_i^{\lambda_i}d\mathbf{x}\,\,d\mathbf{y} \\ &=\int\limits_{{x_i\geq 0, x_1+\ldots +x_k=u}\atop{{y_i\geq 0, y_1+\ldots +y_k=t}\atop {0\leq u,t\leq n^{-2/3}}}} e^{-n(u+t)}\prod_{i=1}^k x_i^{\lambda_i}(1-u)^{\lambda_{k+1}} dx_1\ldots,dx_k dy_1\ldots dy_k\,du\,dt\big(1+o(1)\big)\\ &= \int_0^{n^{-2/3}}\int_0^{n^{-2/3}} e^{-n(u+t)} (\frac{u}{n})^{s-\lambda_{k+1}+k-1} \frac{\prod_{i=1}^k\lambda_i!}{(s-\lambda_{k+1}+k-1)!} (1-u)^{\lambda_{k+1}} (\frac{t}{n})^{k-1}\frac{1}{(k-1)!}\,du\,dt\big(1+o(1)\big)\\ &=\frac{\prod_{i=1}^k \lambda_i!}{n^{2k+s-\lambda_{k+1}}}\left(1+o(1)\right) \end{align*} where last line follows after substituting $z=nu,w=nt$ and completing the tails. Summing over all corners one now gets your formula (after applying Stirling's formula to it, and up to the additional factors).