Letting primes denote derivatives with respect to $\theta$ and $A=C+C^T$, I concur with your calculations
$$
\eqalign{
&&\href{.}{\text{reasons:}}\\
Q''
&=\frac{d^2Q}{d\theta^2}
=\frac{d}{d\theta} \left(F^T\cdot A\cdot F\,'\right)\qquad
&
\href{http://en.wikipedia.org/wiki/Matrix_calculus#Identities}{\text{chain rule, }}
\href{http://en.wikipedia.org/wiki/Matrix_calculus#Derivative_of_quadratic_functions}{\frac{\partial{\mathbf{x}^T\mathbf{C}\mathbf{x}}}{\partial\mathbf{x}}=\mathbf{x}^T\left(\mathbf{C}+\mathbf{C}^T\right)}
\\
&=\left(\frac{d}{d\theta} F^T\cdot A\right)\cdot F\,'
&\href{http://en.wikipedia.org/wiki/Matrix_calculus#Identities}{\text{product rule}}\href{.}{\text{, assumption }F\,''=0}
\\
&=\left(F\,'\right)^T\cdot A \cdot F\,'
&\href{http://en.wikipedia.org/wiki/Matrix_calculus#Identities}{\text{product rule}}\href{.}{\text{, assumption }A\,'=0}
\\&&\text{or }
\href{http://en.wikipedia.org/wiki/Matrix_calculus#Identities}{\text{chain rule, }}
\href{http://en.wikipedia.org/wiki/Matrix_calculus#Derivative_of_linear_functions}{\frac{\partial{\mathbf{x}^T\mathbf{A}}}{\partial\mathbf{x}}=\mathbf{A}}
}
$$
The logic is all based on summation.
All the transposes are just what is necessary
because of the way we define matrix products.
For vectors $\vec{x},\vec{y}\in\mathbb{R}^n\cong\mathbb{R}^{n\times1}$
identified with $n\times1$ column matrices,
$$
\eqalign{
\vec{x}^T\cdot\vec{y}
&=\left[\matrix{x_1\\\vdots\\x_n}\right]^T \cdot
\left[\matrix{y_1\\\vdots\\y_n}\right]
\\
&=\left[\matrix{x_1&\cdots&x_n}\right] \cdot
\left[\matrix{y_1\\\vdots\\y_n}\right]
\\
&=\sum_{i=1}^n x_i y_i
}
$$
Tensor notation goes further and dispenses with the summation signs, but introducing a convention of superscript-subscript pairing in cancelling indices: $x_iy^i$ (or $x^iy_i$), for the above. But since this is a single real quantity, the normal product rule applies to its derivative, and then the matrix product rules mean we can write it the way they appear in multifarious sources. A good source would be a book on real analysis, vector calculus/analysis, tensor calculus/analysis or differential geometry. For a (real analysis) example, Rudin's Principles of Mathematical Analysis, but there are probably also many other good recommendations.
So the second derivative of a quadratic form in $\mathbf{x}(\theta)$ with constant matrix $C$, where $\mathbf{x}''=\mathbf{0}$, is another quadratic form, in $\mathbf{x}'$, with matrix $A=C+C^T$. Good to know.
As to the convergence, are the conditions for Newton's method met?
Is $Q'\ne0$ on a suitable interval $I$ of $\theta$? Do you know the value of
$$M=\sup_{\theta\in I}\frac12\left|\frac{Q''}{Q'}\right|,$$
your constant for quadratic convergence
(i.e. so that $|\epsilon_{n+1}|\le M\epsilon_n^2$)? And, lastly, do you have good initial estimates?
Also, what is the geometry (i.e. eigenvalues/-vectors, Witt index) of $C$ & $A$?
The differential is correct
$$\eqalign{
dF &= dX\,AA^T \cr
&= I\,dX\,AA^T\cr
}$$
What I normally do at this point is to follow the Magnus-Neudecker convention and apply vec() to both sides
$$\eqalign{
{\rm vec}(dF) &= (AA^T\otimes I)\,{\rm vec}(dX) \cr
d{\rm vec}(F) &= (AA^T\otimes I)\,\,d{\rm vec}(X) \cr\cr
\frac {\partial\,{\rm vec}(F)} {\partial\,{\rm vec}(X)^T} &= AA^T\otimes I \cr
}$$
If you don't use vectorization, then you have to deal with $\frac{\partial F}{\partial X}$ as a full-blown fourth-order tensor. In which case index notation is the best way to proceed.
In any case, the derivative is definitely not $AA^T$, which is just a matrix, i.e. a second-order tensor.
Best Answer
For convenience, define the ${\tt1}\in{\mathbb R}^B$ all-ones vector and the following ${\mathbb R}^N$ vectors $$\eqalign{ a &= M^T{\tt1},\quad b = M_0^T{\tt1},\quad c = \frac{a-b}{a}= ({\tt1}-b\oslash a) \\ w &= 4\,c\odot c\odot c\odot b\oslash a\oslash a \\ }$$ and the associated diagonal matrices $$\eqalign{ A &= {\rm Diag}(a),\quad B= {\rm Diag}(b),\quad C= {\rm Diag}(c)= (I-BA^{-1}) \\ W &= 4BA^{-2}C^3 \\ dC &= -B\,dA^{-1}= BA^{-2}dA \\ }$$
Then the function of interest can be written as $$\eqalign{ \psi &= \|C\|_4^4 \\&= I:C^4 \\ d\psi &= I:4C^3dC \\ &= 4C^3:BA^{-2}dA \\ &= W:dA \\ &= w:da \\ &= w : dM^T{\tt1} \\ &= {\tt1}w^T : dM \\ \frac{\partial\psi}{\partial M} &= {\tt1}w^T \\ \\ }$$ In the above, the symbol $(\odot)$ denotes elementwise multiplication, $(\oslash)$ denotes elementwise division, and $(:)$ represents the trace/Frobenius product, i.e. $$A:B = {\rm Tr}(A^TB)$$ Note that the $\{A,B,C,W\}$ matrices are diagonal and therefore they commute with each other, while the $M$ matrix is rectangular and does not commute with anything.