Below the second part is rather inelegant, and I think this can possibly be improved. Suggestions are welcome.
Note that the LHS is the Jensen-Shannon divergence ($\mathrm{JSD}$) between $P_1$ and $P_2$, and that $\mathrm{JSD}$ is a $f$-divergence. For $f$-divergences generated by $f, g$ the joint ranges of $D_f,D_g$ are defined as
\begin{align}
\mathbb{R}^2 \supset \mathcal{R} :=& \{ (D_f(P\|Q), D_g(P\|Q)): P, Q \textrm{ are distributions on some measurable space} \} \\
\mathcal{R}_k :=& \{ (D_f(P\|Q), D_g(P\|Q)): P, Q \textrm{ are distributions on } ([1:k], 2^{[1:k]} ) \}\end{align}
A remarkable theorem of Harremoees & Vajda (see also these notes by Wu) states that for any pair of $f$-divergences, $$\mathcal{R} = \mathrm{co}(\mathcal{R}_2),$$ where $\mathrm{co}$ is the convex hull operator.
Now, we want to show the relation $\mathrm{JSD} \le h(d_{TV}).$ Since both $\mathrm{JSD}$ and $d_{TV}$ are $f$-divergences, and since the set $\mathcal{S} := \{ y - h(x) \le 0\}$ is convex in $\mathbb{R}^2$, it suffices to show this inequality for distributions on $2$-symbols, since by the convexity we have $\mathcal{R}_2 \subset \mathcal{S} \implies \mathrm{co}(\mathcal{R}_2) \subset \mathcal{S},$ as the convex hull of a set is the intersection of all convex sets containing it. The remainder of this answer will thus concentrate on showing $\mathcal{R}_2 \subset \mathcal{S}$.
Let $p := \pi + \delta, q:= \pi - \delta,$ where $\delta \in [0,1/2]$ and $\pi \in [\delta, 1- \delta].$ We will show that $$ \mathrm{JSD}(\mathrm{Bern}(p)\|\mathrm{Bern}(q) ) \le h\left(\frac{1}{2}d_{TV}(\mathrm{Bern}(p)\|\mathrm{Bern}(q) )\right) = h(\delta), \tag{1}$$ which suffices to show the relation on $2$-letter distributions. Note that above $p\ge q$ always, but this doesn't matter since both $\mathrm{JSD}$ and $d_{TV}$ are symmetric in their arguments.
For conciseness I'll set represent the $\mathrm{JSD}$ above by $J$. All '$\log$'s in the following are natural, and we will make use of the simple identities for $p \in (0,1)$ $$ \frac{\mathrm{d}}{\mathrm{d}p} h(p) = \log \frac{1-p}{p} \\ \frac{\mathrm{d}^2}{\mathrm{d}p^2} h(p) = -\frac{1}{p} - \frac{1}{1-p}. $$
By the expansion in the question, $$J(\pi, \delta) = h( \pi) - \frac{1}{2} h(\pi + \delta) - \frac{1}{2} h(\pi - \delta).$$
It is trivial to see that the relation $(1)$ holds if $\delta = 0$. Let us thus assume that $\delta > 0.$ For $\pi \in (\delta, 1-\delta),$ we have
\begin{align} \frac{\partial}{\partial \pi} J &= \log \frac{1-\pi}{\pi} - \frac{1}{2} \left( \log \frac{1 - \pi - \delta}{\pi + \delta} + \log \frac{1 - \pi +\delta}{\pi - \delta}\right) \end{align} and \begin{align} \frac{\partial^2}{\partial \pi^2} J &= \frac{1}{2} \left( \frac{1}{\pi + \delta} + \frac{1}{\pi - \delta} + \frac{1}{1 - \pi - \delta} + \frac{1}{1 - \pi + \delta} \right) - \frac{1}{\pi} - \frac{1}{1-\pi} \\
&= \frac{\pi}{\pi^2 - \delta^2} - \frac{1}{\pi} + \frac{1 - \pi}{( 1-\pi)^2 - \delta^2} - \frac{1}{1-\pi} \\
&= \frac{\delta^2}{\pi(\pi^2 - \delta^2)} + \frac{\delta^2}{(1-\pi)( (1-\pi)^2 - \delta^2)} > 0,
\end{align}
where the final inequality uses $\delta > 0,$ and that $ \pi \in (\delta, 1-\delta).$
As a consequence, for every fixed $\delta >0,$ $J$ is strictly convex on $(\delta, 1-\delta).$ Since the maxima of a convex function on an interval must lie on the end points, we have $$ J(\pi ,\delta) \le \max( J(\delta, \delta), J(1- \delta, \delta) ).$$
But $$J(\delta, \delta) = h(\delta) - \frac{1}{2} (h(2\delta) + h(0) ) = h(\delta) - \frac{1}{2} h(2\delta),$$ and similarly $$J(1-\delta, \delta) = h(\delta) - \frac{1}{2} h(1-2\delta) = h(\delta) - \frac{1}{2} h(2\delta),$$ by the symmetry of $h$. We immediately get that for every $\delta \in [0,1/2], \pi \in [\delta, 1-\delta],$ $$J(\pi, \delta) \le h(\delta) - \frac{1}{2} h(2\delta) \le h(\delta),$$ finishing the argument.
Note that the last line indicates something stronger for $2$-symbol distributions: $J(\pi, \delta) \le h(\delta) - h(2\delta)/2$. Unfortunately the RHS is a convex function of $\delta$, so this doesn't directly extend to all alphabets. It'd be interesting if a bound that has such an advantage can be shown in general.
In case this helps:
Let $X_i$ be independent Bernoulli random variables with parameter $q_i = P(X_i =1)$.
Let $W$ be a random variable (independent of $X_i$) taking values on $i=1,2 \cdots n$ with $P(W=i)=p_i$.
Let's define a new Bernoulli variable $Y$, which corresponds to picking one of $X_i$ with prob $p_i$, i.e. $Y= X_W$
Then $$\begin{align}
A &= h(\sum_{i=1}^{n}p_iq_i)-\sum_{i=1}^{n} p_i h(q_i) \\
&= H(Y) - H(Y|W) \\
&= I(Y;W) \\
&= H(W) - H(W|Y) \\
&= H(\{p_i\}) - b \, H\left(\frac{\{p_i q_i\}}{b}\right) -(1-b)H\left(\frac{\{p_i (1-q_i)\}}{1-b}\right)
\end{align}
$$
where $b= \sum p_i qi$
Added: In this related question I develop an approximation for nearly constant $q_i$.
Best Answer
If $k=b/a$ is not very small or very big then I'd parametrize in this way:
$$\begin{align} p_1 &= p^{1+x} \\ p_2 &= p^{1-x} \\ x &= \frac{\log(p_1/p_2)}{\log(p_1 p_2)} = \frac{b-a }{b+a} =\frac{k-1}{k+1} \\ p &= \sqrt{p_1 p_2}=\exp\left(- \frac{a+b}{2ab}\right)= \exp\left(- \frac{1}{a} \frac{1}{1+x}\right) \end{align} $$
Notice that $p$ is somewhere between $p_1$ and $p_2$, and if (say) $\frac{1}{9}<k<9$ then $-0.8<x<0.8$.
Doing a Taylor expasion of $f$ around $x=0$, we get the approximation
$$ f \approx -\log{\left( 1-p\right) }+\frac{p \left( 2 {{p}}-1\right) {{\log^{2}{(p)}}} }{6 (1-p)^2}{{x}^{2}} $$
(I'm using natural logarithms here, entropy in nats - to get it in bits just divide everything by $\log(2)$)
The approximation seems quite good, assuming $|x|$ is not very near $1$ and $p$ is not too small. Here's a graph for three values of $p$ and $x\in [-0,8,0.8]$. The dashed lines correspond to the approximation.
If you need more precision you can add the next term, which is
$$\frac{p\, \left(7-28p +47 p^2 - 8 p^3 \right) {{\log^4{(p)}}}}{360 {{\left(1-p\right) }^{4}}} x^4$$