First lets state the claim.
Claim: $I(X;Y)=D(p(x,y)||p(x)p(y))$ is concave in $p(x)$ for fixed $p(y|x)$ and convex in $p(y|x)$ for fixed $p(x)$.
One can write the following:
$I(X;Y)=H(Y)-H(Y|X)$ where $H$ is the entropy function. The entropy functions can further be written as
$$H(Y)=-\sum_y p(y)\log(p(y)) \ \text{ and } \ H(Y|X)=-\sum_x p(x)\sum_y p(y|x)\log (p(y|x))$$
What we know is that $H(Y)$ is a concave function of $p(y)$, because $x\log x$ is a convex function of $x$. We can write $p(y)=\sum_x p(y|x)p(x)$. Hence, whenever $p(y|x)$ is fixed, meaning that it is constant for all $p(y|x)$, $p(y)$ is a linear function of $p(x)$. Therefore $H(Y)$ is also concave in $p(x)$. For fixed $p(y|x)$, $H(Y|X)$ is also linear in $p(x)$. Now consider a function which is a sum of a linear and a concave function $f(x)=c(x)+l(x)$. Since $f^{''}(x)=c^{''}(x)$, where $(\cdot)^{''}$ is the second derivative, $f$ is also concave. Therefore, $I(X;Y)$ is a concave function of $p(x)$.
Now lets come to the second part. When $p(x)$ is constant, this time $p(y)=\sum_x p(y|x)p(x)$ is a linear function of $p(y|x)$. Hence, $H(Y)$ is a concave function of $p(y|x)$. Due to same reason $H(Y|X)$ is also a concave function. Difference of two concave functions can either be concave or convex or both. Going back to the definition $I(X;Y)=D(p(x,y)||p(x)p(y))$, we also know that $D(\cdot||p(x)p(y))$ is convex in $p(x)p(y)$, since KL divergence is convex in both terms. Now, $p(x)$ was constant and $p(y)$ was linear in $p(y|x)$. Therefore, the right choice was that $I$ is convex in $p(y|x)$, else $D$ cannot be convex in $p(x)p(y)$.
Another idea would be to show that $I(X;Y)$ accepts no maximum over $p(y|x)$, except for the boundaries. I dont know however how this can be shown. What I know is that $I(X;Y)$ is always upperbunded by $\min[ H(X),H(Y)]$, simply from the symmetry of $I(X;Y)=H(Y)-H(Y|X)=H(X)-H(X|Y)$.
First, for convenience rewrite $J(p,p')$ as
\begin{align}
J(p,p') &= -D(p\Vert p')+\sum_{i}p_{i}\sum_{j}Q_{ij}\log\frac{Q_{ij}}{q_{j}}+\sum_{j}q_{j}\log\frac{q_{j}}{q'_{j}} \\
&= I({p})-\left[D(p\Vert p')-D(q\Vert q')\right]\\
& = I(p) - \Delta D(p\Vert p')
\end{align}
where I have used your definition of $I(p)$, and defined
\begin{align}
\Delta D(p\Vert p')= D(p\Vert p')-D(q\Vert q')
\end{align}
$\Delta D(p\Vert p')$ reflects the "contraction" of Kullback-Leibler divergence between $p$ and $p'$ under the stochastic matrix $Q$. $\Delta D$ is convex in the first argument. To see why, consider any two distributions $p$ and $\bar{p}$, and define the convex mixture $p^\alpha = (1-\alpha) p + \alpha \bar{p}$. We will show convexity by demonstrating that the second derivative with respect to $\alpha$ is non-negative.
First, compute the first derivative w.r.t. $\alpha$ as
\begin{align*}
& {\textstyle \frac{d}{d\alpha}}\Delta D(p^{\alpha}\Vert q) ={\textstyle \frac{d}{d\alpha}}\left[\sum_{i}p_{i}^{\alpha}\log p_{i}^{\alpha}-\sum_{i}p_{i}^{\alpha}\sum_{j}Q_{ij}\log q_{j}^{\alpha}+\sum_{i}p_{i}^{\alpha}\sum_{j}Q_{ij}\log\frac{q_{j}'}{p'_{i}}\right]\\
& =\sum_{i}(\bar{p}_{i}-p_{i})\log p_{i}^{\alpha}-\sum_{i}(\bar{p}_{i}-p_{i})\sum_{j}Q_{ij}\log q_{j}^{\alpha}+\sum_{i}(\bar{p}_{i}-p_{i})\sum_{j}Q_{ij}\log\frac{q_{j}'}{p'_{i}}
\end{align*}
Then, we compute the second derivative at $\alpha=0$ as
\begin{align}
{\textstyle \frac{d^{2}}{d\alpha^{2}}}\Delta D(p^{\alpha}\Vert q)\vert_{\alpha=0}=\sum_{i}\frac{\left(\bar{p}_{i}-p_{i}\right)^{2}}{p_{i}}-\sum_{j}\frac{\left(\bar{q}_{j}-q_{j}\right)^{2}}{q_{j}}
\end{align}
$\sum_{i}\frac{\left(\bar{p}_{i}-p_{i}\right)^{2}}{p_{i}}$ is the so-called $\chi^2$ divergence from $p$ to $\bar{p}$, and $\sum_{j}\frac{\left(\bar{q}_{j}-q_{j}\right)^{2}}{q_{j}}$ is the same once $Q$ is applied to $p$ and $\bar{p}$.
Note that $\chi^2$ divergence is a special case of a $f$-divergence, and therefore obeys a data-processing inequality (see e.g. Liese and Vajda, IEEE Trans on Info Theory, 2006, Thm. 14). In particular, that means that ${\textstyle \frac{d^{2}}{d\alpha^{2}}}\Delta D(p^{\alpha}\Vert q)\vert_{\alpha=0} \ge 0$.
At the same time, $\Delta D(p\Vert p')$ is not convex in the second argument. Consider $p = (0.5,0.5,0)$, $q=(0.5,0.25,0.25)$, and $Q = \left( \begin{smallmatrix} 0.95 & 0.05 \\ 1 & 0 \\ 0 & 1\end{smallmatrix} \right)$. Here is a plot of $\Delta D$ where the first argument is $p$ and the second argument is a convex mixture of between $p$ and $q$, $\alpha p + (1-\alpha) q$, for different $\alpha$:
(see code at https://pastebin.com/q8XLnGK8)
By visual inspection, it can be verified $\Delta D(p\Vert p')$ is neither convex nor concave in the second argument.
Regarding your questions:
(1) $I({p})$ is known to be concave in $p$ (Theorem 2.7.4 in Cover and Thomas, 2006). As we've shown $\Delta D(p\Vert p')$ is convex in $p$, so $-\Delta D(p\Vert p')$ is concave in $p$. Since the sum of concave functions is concave, $J(p,p') = I(p) - \Delta D(p\Vert p')$ is concave in $p$.
At the same time, as a function of the second argument $p'$,
$J(p,p') = \mathrm{const} - \Delta D(p\Vert p')$, and we've shown above that $\Delta D(p\Vert p')$ is neither convex nor concave in the second argument. Thus, $J(p,p')$ is not concave in the second argument.
(2) $\Delta D(p\Vert p') \ge 0$ by the data processing inequality for KL divergence (Csiszar and Körner, 2011, Lemma 3.11). That means that
\begin{align}
J(p,p') = I(p) - \Delta D(p\Vert p') \le I(p) ,
\end{align}
from which $J(p,p') \le \max_s I(s)$ follows immediately.
Best Answer
It is neither convex nor concave. You can work it out using Bernoulli random variables.
Not convex:
1) Let $X$ and $Y$ be i.i.d. Bernoulli with $Pr[X=1]=1/2$. Then $I(X,Y)=0$.
2) Let $A=B=1$ (constants). Then $I(A,B)=0$.
3) Let $(W,Z) = \left\{\begin{array}{ll} (X,Y) & \mbox{with prob 1/2} \\ (A,B) & \mbox{with prob 1/2} \end{array}\right. $
The joint probability distribution for $(W,Z)$ is a convex combination of those for $(X,Y)$ and $(A,B)$. Yet, $I(W,Z)>0$.
Not concave:
1) Let $X$ and $Y$ be i.i.d. Bernoulli with $Pr[X=1]=1/2$. Then $I(X,Y)=0$.
2) Let $A=B$ where $A$ is Bernoulli with $Pr[A=1]=1/2$. Then $I(A,B)=1$.
3) Let $(W,Z) = \left\{\begin{array}{ll} (X,Y) & \mbox{with prob 1/2} \\ (A,B) & \mbox{with prob 1/2} \end{array}\right. $
Then $Pr[W=1]=Pr[Z=1]=1/2$ and: \begin{align} Pr[(W,Z)=(0,0)] &= 3/8\\ Pr[(W,Z)=(0,1)] &= 1/8\\ Pr[(W,Z)=(1,0)] &= 1/8\\ Pr[(W,Z)=(1,1)] &= 3/8 \end{align} and you can show that $I(W,Z) < 1/2 = (1/2)I(X,Y)+(1/2)I(A,B)$.