You made mistake taking logariphm from multivariate likelihood. $\log(\det(M)$ should go with the minus sign.Thus everything is convex.Derivative of $\log(det(M)^\prime_M=M^{-1}$. Additionally it must be mulitplier $N$ here.
Derivative of $\sum_i x_i^TMx=\sum x_ix_i^T$. So you final optimal solution will be $M= (\sum x_ix_i^T/N)^{-1}$ - the inverse of covariance matrix as it should be.
EDITION. I made mistake because forgot that $M$ is inverse of covariance matrix. So $\log(det(M)$ comes with the correct sign. However, likelihood is a product of exponents and loglikelihood is logarihtm of product of exponents rather the logarithm of sum exponents. So fianlly you jave concave + linear function which has the perfect maximum.
This is a wrong idea to make summation of logarithm of distributions. We have to take the logarithm of product with leads to sum of logarithms.
More precisely the correct way to look for optimal covariance matrix is try to maximize of $\Sigma$ the following mulitvariate loglikelihood
$$
\prod_i \frac{1}{\sqrt{(2\pi)^k|\boldsymbol\Sigma|}} \exp\left(-\frac{1}{2}{\mathbf x_i}^T{\boldsymbol\Sigma}^{-1}{\mathbf x_i} \right),
$$
If your have weights then
$$
\prod_i \frac{1}{\sqrt{(2\pi)^k|\boldsymbol\Sigma|^{w_i}}} \exp\left(-w_i\frac{1}{2}{\mathbf x_i}^T{\boldsymbol\Sigma}^{-1}{\mathbf x_i} \right),
$$
Taking logarithm you will get the weighted loglikelihood.
$\textbf{1. The theoretical method}$. Let $A$ be symmetric $\geq 0$ and $f:X\in \Delta\mapsto \log(\det(I+XAX))$ where $\Delta=\{X=diag(x_1,\cdots,x_n);x_i\in [0,1],\sum_i x_i=1\}$.
$\textbf{Proposition 1}$. If $f$ admits an extremum in $X=diag(x_i)$ where $x_i\in (0,1)$, then, the entries of the diagonal of the matrix $AXAdjoint(I+XAX)$ are equal.
$(*)$ Then, with the relation $\sum_i x_i=1$, one has $n$ relations linking the $n$ unknowns $x_i$.
$\textbf{Proof}$. We use the Lagrange method. There is $\lambda$ s.t., for every diagonal $H$,
$Df_X(H)+\lambda tr(H)=0$, that is,
$0=tr((HAX+XAH)(I+XAX)^{-1}+\lambda H)=$
$tr(H(AX(I+XAX)^{-1}+(I+XAX)^{-1}XA+\lambda I))$.
That implies that the entries of the diagonal of the symmetric matrix
$U+U^T=\dfrac{1}{\det(I+XAX)}(AXAdjoint(I+XAX)+Adjoint(I+XAX)XA)$
are equal to $-\lambda$. $\square$
EDIT. Unfortunately,
i) the system $(*)$ has many solutions (until $7$ real ones when $n=3$) and it's hard to get them all (when $n$ is large).
ii) There are instances where the required maximum is reached in a point $X$ s.t. some $x_i$'s are $0$.
$\textbf{2. Using software}$.
$\textbf{Proposition 2}$. $f$ is increasing wrt each $x_i$.
$\textbf{Sketch of the proof}$. Since $I+XAX$ is symmetric $>0$, the signum of $\dfrac{\partial f}{\partial x_1}(X)=Df_X(diag(1,0,\cdots,0))$ (cf. above) is the same as the signum of $(AXAdjoint(I+XAX))[1,1]$... $\square$
Thus we can replace $\Delta$ with $Z=\{X=diag(x_1,\cdots,x_n);x_i\in [0,1],\sum_i x_i\leq 1\}$. This allows more directions for the gradient method.
I use the software NLPSolve (by Maple); the NLPSolve command uses various methods implemented in a built-in library provided by the Numerical Algorithms Group (NAG).
I propose $2$ methods (Some tests seem to show that the first one is better).
i) Over $\Delta$, with initial point $1/n.I_n$.
ii) Over $Z$, with initial point $\dfrac{1}{kn}I_n$ where $k>1$.
$\textbf{Remark}$. The solutions are (very often) in the form
there are $p<n$ indices $i$ s.t. $x_i=0$ and the $n-p$ other $x_i$'s are close to $\dfrac{1}{n-p}$.
Best Answer
Make the substitution $Z = (X+Y)^{-1} \Leftrightarrow X = Z^{-1} - Y$, then the problem becomes convex:
$$\text{maximize } \log \mathrm{det}(Z) - a^T Z a \\ \text{subject to } 0 \prec Z \preceq (Y+W)^{-1} $$