Apart from the fact that it should be $m-1$ instead of $n-1$ in the right-hand denominator, your estimator for $\sigma^2$ looks fine. You can do slightly better on the variance of $\hat\mu$ (though the question didn't ask to optimize it): Consider a general convex combination
$$
\alpha\frac{X_1+\dotso+X_n}n+(1-\alpha)\frac{Y_1+\dotso+Y_m}{2m}
$$
of the individual estimators for $\mu$. The variance of this combined estimator is
$$
n\left(\frac\alpha n\right)^2\sigma^2+m\left(\frac{1-\alpha}{2m}\right)^2\sigma^2=\left(\frac{\alpha^2}n+\frac{(1-\alpha)^2}{4m}\right)\sigma^2\;,
$$
and minimizing this by setting the derivative with respect to $\alpha$ to zero leads to $\alpha=n/(n+4m)$, yielding the variance $\sigma^2/(n+4m)$. For $n=m$ the variance is $\frac15\sigma^2/n=0.2\sigma^2/n$, compared to $\frac5{16}\sigma^2/n\approx0.3\sigma^2/n$ for your estimator, and for $n$ fixed and $m\to\infty$ or vice versa, the variance of this estimator tends to zero whereas the variance of your estimator tends to a non-zero value.
You could optimize the variance of your unbiased variance estimator in a similar way, though the calculation would be a bit more involved.
Let's assume you have a population of size $N$ with values $x_1,\ldots,x_N$, mean $\bar x=\frac{1}{N}\sum_{i=1}^N x_i$ and variance $\sigma^2=\frac{1}{N}\sum_{i=1}^N(x_i-\bar x)^2$. (Note that I use lower case $x_i$ to indicate these are not random, but fixed values.)
Now, let's take a random sample $Y_1,\ldots,Y_n$ of $n$ elements (without replacement), with all such subsets equally likely. (Now I use capital $Y$ to indicate these are random.)
Now, $\bar Y=\frac{1}{n}\sum_{i=1}^n Y_i$ and let $V=\sum_{i=1}^n (Y_i-\bar Y)^2$ so that the sample variance would be $V/n$ (like the expression for $\sigma^2$). If we write $V$ out in terms of $(Y_i-\bar x)^2$ and $(Y_i-\bar x)(Y_j-\bar x)$, we get
$$
\begin{split}
V
=& \sum_{i=1}^n (Y_i-\bar Y)^2
= \sum_{i=1}^n \left[(Y_i-\bar x)-(\bar Y-\bar x)\right]^2 \\
=& \sum_{i=1}^n \left[(Y_i-\bar x)^2-2(Y_i-\bar x)(\bar Y-\bar x)+(\bar Y-\bar x)^2 \right] \\
=& \sum_{i=1}^n (Y_i-\bar x)^2 - n(\bar Y-\bar x)^2 \\
=& \left(1-\frac{1}{n} \right) \sum_{i=1}^n (Y_i-\bar x)^2
-\frac{2}{n}\sum_{1\le i<j\le n} (Y_i-\bar x)(Y_j-\bar x)
\end{split}
$$
where in the last step we use that
$$
\left(\sum_{i=1}^n (Y_i-\bar x)\right)^2
= \sum_{i=1}^n (Y_i-\bar x)^2 + 2\sum_{1\le i<j\le n} (Y_i-\bar x)(Y_j-\bar x).
$$
We know that $\text{E}[(Y_i-\bar x)^2]=\sigma^2$: this is just taking the average of $(Y_i-\bar x)^2$ for $Y_i$ sampled from $x_1,\ldots,N$.
For $i<j$, we can compute $\text{E}[(Y_i-\bar x)(Y_j-\bar x)]$ by using that this is the same as the average of $(x_i-\bar x)(x_j-\bar x)$ for all $1\le i<j\le N$. Since $\sum_{i=1}^N (x_i-\bar x)=0$, we get
$$
0 = \sum_{1\le i,j\le N} (x_i-\bar x)(x_j-\bar x)
= \sum_{i=1}^N (x_i-\bar x)^2 + 2\sum_{1\le i<j\le N} (x_i-\bar x)(x_j-\bar x)
$$
which for $i<j$ makes
$$
\text{E}\left[(Y_i-\bar x)(Y_j-\bar x)\right]
= -\frac{\sigma^2}{N-1}.
$$
Combining these results, we get
$$
\text{E}[V] = (n-1)\sigma^2 + \frac{n-1}{N-1}\sigma^2
= \frac{(n-1)N}{N-1}\sigma^2
$$
giving an unbiased estimator
$$
\hat\sigma^2 = \frac{N-1}{N(n-1)}V
= \frac{N-1}{N(n-1)} \sum_{i=1}^n (Y_i-\bar Y)^2.
$$
As $N\rightarrow\infty$, you get the familiar $s^2$ estimator which corresponds to independent sampling from a distribution, while $n=N$ gives just $\sigma^2$ as it should when the $x_i$ are known for the whole population.
Best Answer
First ask yourself, what does it mean for a statistic to be an estimator? Do all estimators have to be "good" ones?
Next, the MLE is "best" in the sense that such a choice maximizes the likelihood function for the observed sample, but that doesn't necessarily mean it is the only suitable choice for an estimator. It is in some sense the most likely choice for the parameter given the data we observed, but from the point of view of biasedness, it tends to underestimate the true variance. That is to say, the MLE for $\sigma^2$ will, on average, give an estimate that is too small for a fixed sample size, whereas $s^2$ does not have this problem, especially when the sample size is small.
We can also think of the quality of an estimator as being judged by other desirable properties; e.g., consistency, asymptotic unbiasedness, minimum mean squared error, or UMVUE. Maximum likelihood is just one possible criterion.