Index notation illustrates the problem quite well.
Denoting $\frac{\partial}{\partial x_n}$ by $\partial_n$, we have
$$\eqalign{
C_{ik} &= A_{ij}B_{jk} \cr
\partial_nC_{ik} &= \big(\partial_n A_{ij}\big)B_{jk} + A_{ij}\big({\partial_n B_{jk}}\big) \cr
}$$
The first term is fine for the contraction over the $j$-index, but in the second term
the index is sandwiched between the $n$ and $k$ indices. Since the term is a third-order tensor, there is no way to fix it. This is unlike the following case.
In the case where $(A,B)$ are vectors, simply omit the $(i,k)$ indices to obtain
$$\eqalign{
C &= A_{j}B_{j} \cr
\partial_nC &= \big(\partial_n A_{j}\big)B_{j} + A_{j}\big({\partial_n B_{j}}\big) \cr
}$$
This expression can be fixed by transposing the second term.
Generalizing in the other direction, if $X$ is a matrix, $\frac{\partial}{\partial X_{nm}}=\partial_{nm}$, and appending the $m$-index results in
$$\eqalign{
\partial_{nm}C_{ik} &= \big(\partial_{nm} A_{ij}\big)B_{jk} + A_{ij}\big({\partial_{nm} B_{jk}}\big) \cr
}$$
Now the gradients in parentheses are fourth-order tensors, and once again there is no simple operation that will re-order the indices.
The simplest way to avoid the problem is to use differentials instead of gradients.
$$\eqalign{
C &= A\star B \cr
dC &= dA\star B + A\star dB \cr
}$$
where $(A,B,C)$ can be scalars, vectors, or tensors, and $(\star)$ can be any kind of product (Kronecker, Hadamard, Frobenius, tensor, matrix) which is compatible with their given dimensions.
For convenience, define the ${\tt1}\in{\mathbb R}^B$ all-ones vector
and the following ${\mathbb R}^N$ vectors
$$\eqalign{
a &= M^T{\tt1},\quad b = M_0^T{\tt1},\quad
c = \frac{a-b}{a}= ({\tt1}-b\oslash a) \\
w &= 4\,c\odot c\odot c\odot b\oslash a\oslash a \\
}$$
and the associated diagonal matrices
$$\eqalign{
A &= {\rm Diag}(a),\quad B= {\rm Diag}(b),\quad C= {\rm Diag}(c)= (I-BA^{-1}) \\
W &= 4BA^{-2}C^3 \\
dC &= -B\,dA^{-1}= BA^{-2}dA \\
}$$
Then the function of interest can be written as
$$\eqalign{
\psi &= \|C\|_4^4 \\&= I:C^4 \\
d\psi
&= I:4C^3dC \\
&= 4C^3:BA^{-2}dA \\
&= W:dA \\
&= w:da \\
&= w : dM^T{\tt1} \\
&= {\tt1}w^T : dM \\
\frac{\partial\psi}{\partial M} &= {\tt1}w^T \\ \\
}$$
In the above, the symbol $(\odot)$ denotes elementwise multiplication,
$(\oslash)$ denotes elementwise division,
and $(:)$ represents the trace/Frobenius product, i.e.
$$A:B = {\rm Tr}(A^TB)$$
Note that the $\{A,B,C,W\}$ matrices are diagonal and therefore they commute with each other, while the $M$ matrix is rectangular and does not commute with anything.
Best Answer
For convenience, define the ${\tt1}\in{\mathbb R}^B$ all-ones vector and the following ${\mathbb R}^N$ vectors $$\eqalign{ a &= M^T{\tt1},\quad b = M_0^T{\tt1},\quad c = \frac{a-b}{a}= ({\tt1}-b\oslash a) \\ w &= 2\,c\odot b\oslash a\oslash a \\ }$$ and the associated diagonal matrices $$\eqalign{ A &= {\rm Diag}(a),\quad B= {\rm Diag}(b),\quad C= {\rm Diag}(c)= (I-BA^{-1}) \\ W &= 2BCA^{-2}\\ dC &= -B\,dA^{-1}= BA^{-2}dA \\ }$$
Then the function of interest can be written as $$\eqalign{ \phi &= \|c\|^2 \\&= C:C \\ d\phi &= 2C:dC \\ &= 2C:BA^{-2}dA \\ &= W:dA \\ &= w:da \\ &= w : dM^T{\tt1} \\ &= {\tt1}w^T : dM \\ \frac{\partial\phi}{\partial M} &= {\tt1}w^T \\ \\ }$$ In the above, the symbol $(\odot)$ denotes elementwise multiplication, $(\oslash)$ denotes elementwise division, and $(:)$ represents the trace/Frobenius product, i.e. $$A:B = {\rm Tr}(A^TB)$$ Note that the $\{A,B,C,W\}$ matrices are diagonal and therefore they commute with each other, while the $M$ matrix is rectangular and does not commute with anything.