I can give you a simplification of the problem and a precise answer in the limit as $\epsilon \to 0$. This precise answer also yields a good upper bound in the upward direction $H > H_0$, using the fact that entropy is a concave function. In the downward direction $H < H_0$ things are more annoying, and a good answer depends on the size of $\epsilon$ and what type of estimate you want.
There is a convex body $B$ of density matrices $\rho$ and a map from that to the much simpler convex body $S$ of unordered spectra $\vec{\lambda}$. The simplification of your problem is that the Hilbert-Schmidt metric on density matrices descends to the Euclidean metric (or $\ell^2$ metric or Hilbert-Schmidt metric) on spectra. $S$ is a simplex, in fact the quotient of a regular simplex $T$ by its isometries. Since $H(\rho)$ only depends on its spectrum, you might as well work in $S$ or $T$ rather than in $B$. In fact $T$ is exactly the convex body of classical states on $D$ configurations rather than quantum states. So the question is not really quantum at all, it is a question about the Shannon entropy $H(\vec{\lambda})$ of distributions $\vec{\lambda}$ on a set with $D$ elements.
It is easy to take the gradient of $H(\vec{\lambda})$. The differential is just
$$\sum_k -(\ln(\lambda_k) + 1)d\lambda_k.$$
Since the sum of the coordinates is constant, you can drop the second term of the summand. So to maximize the deviation of $H(\vec{\lambda})$, you should push $\vec{\lambda}$ in the direction
$$\delta \lambda_k = -\ln(\lambda_k)+\frac1D \sum_k \ln(\lambda_k).$$
The norm of the gradient is then an optimal upper bound as $\epsilon \to 0$. As mentioned at the beginning, if the entropy is increasing, this derivative bound holds for any $\epsilon$, because entropy is concave. In the other direction you are "falling off a cliff" instead of "climbing to the top of the dome", and I would have to understand more about what sort of bound you want. (Maybe only because I haven't grasped all of the later details of your question.)
For the uniform state (the maximally mixed state) I can conjecture a precise answer in the down direction. Then you are in the center of the simplex $T$. I conjecture that the way to decrease entropy as much as possible is to run straight for a corner, i.e., increase one $\lambda_k$ and keep the others equal.
This is not a reference, but a short proof.
We use the following (probably known, but see later) lemma on representing a symmetric tensor as a linear combination of rank-1 symmetric tensors.
Lemma. Let $A$ be a finite set, $K$ an infinite field. Denote by $\mathcal S$ the set of symmetric functions $p:A^n\to K$. Then $\mathcal S$ is the $K$-span of rank-one functions, that is, the functions of the type $h(x_1)h(x_2)\ldots h(x_n)$, where $h:A\to K$.
Proof. Note that the product of two rank-one functions is a rank-one function. Thus the linear space $\mathcal T$, generated by rank-one functions, coincides with the $K$-algebra generated by them.
We may suppose that $A\subset K$. For $k=0,1,\ldots,n$ denote $e_k(x_1,\ldots,x_n)$ the elementary symmetric polynomial, that is, $\varphi_t(x_1,\ldots,x_n):=\prod(1+tx_i)=\sum_{k=0}^n t^ke_k$. We identify $e_k$ and the corresponding element of $\mathcal S$. Choosing $n+1$ distinct values $t_1,\ldots,t_{n+1}\in K$ and solving the corresponding (Vandermonde's) linear system of equations we represent each $e_k$ as a linear combinations of $\varphi_{t_i}\in \mathcal T$. Thus $e_k\in \mathcal S$ for all $k=0,1,\ldots,n$. It is well known that $e_k$'s generate the algebra of symmetric polynomials (over any field). Thus any symmetric polynomial function belongs to $\mathcal T$. It remains to note that any symmetric function $f\in \mathcal S$ may be represented by a symmetric polynomial. Indeed, a symmetric function $f$ may be represented as $F(e_1,e_2,\ldots,e_n)$ for certain function function $F$ defined on the corresponding finite set (because the values of $e_1,\ldots,e_n$ determine the values of $x_1,\ldots,x_n$ up to permutation). $F$ in turn coincides with a polynomial function on this finite set. $\square$
Now we may prove your theorem for finitely supported function $i\mapsto p_i$. Due to Lemma it may be supposed to have the form $p_i=\prod_{k=1}^n H(i_k)$ for a certain finitely supported function $H$ on $\mathbb{N}$ (as OP, I denote here $\mathbb{N}=\{0,1,\ldots\}$). In this case both parts of your identity are equal to $\det (\sum_m H(m)A^m)$.
Comment. Lemma does not hold for finite fields. For example, if $A=K=\{0,1\}$. Then the function $x+y+z$ is not a linear combination of rank-one functions 1, $xyz$, $(x+1)(y+1)(z+1)$: if $x+y+z=a+bxyz+c(x+1)(y+1)(z+1)$, then for $y=0,z=1,x=a$ we get $0=1$. I must make a warning that in the subject-related paper "Symmetric tensors and symmetric tensor rank" by Pierre Comon, Gene Golub, Lek-Heng Lim, Bernard Mourrain (SIAM Journal on Matrix Analysis and Applications, 2008, 30 (3), pp.1254-1279) this statement, after equation (1.1), is stated for any field, although proved for complex numbers, and the proof uses that a non-zero polynomial has non-zero values.
In any case, you may always enlarge the ground field and safely think that it is infinite.
Best Answer
I'm not sure if this qualifies as simple (or if this is helpful at all), but we have $$ \frac{\phi'_M(t)}{\phi_M(t)}=\sum_n\frac{1}{t-\lambda_n} $$
Using the residue theorem, we can write $$ H[M]=\frac{-1}{2\pi i}\oint\frac{\phi'_M(z)}{\phi_M(z)}z\log(z)\,dz $$ where the integral is taken over a closed contour containing all of the eigenvalues of $M$ (I guess we're either working on the Riemann surface of $\log(z)$, or we chose a branch of $\log(z)$).