This may be considered... cheating, but the OLS estimator is a MoM estimator. Consider a standard linear regression specification (with $K$ stochastic regressors, so magnitudes are conditional on the regressor matrix), and a sample of size $n$. Denote $s^2$ the OLS estimator of the variance $\sigma^2$ of the error term. It is unbiased so
$$ MSE(s^2) = \operatorname {Var}(s^2) = \frac {2\sigma^4}{n-K} $$
Consider now the MLE of $\sigma^2$. It is
$$\hat \sigma^2_{ML} = \frac {n-K}{n}s^2$$
Is it biased. Its MSE is
$$MSE (\hat \sigma^2_{ML}) = \operatorname {Var}(\hat \sigma^2_{ML}) + \Big[E(\hat \sigma^2_{ML})-\sigma^2\Big]^2$$
Expressing the MLE in terms of the OLS and using the expression for the OLS estimator variance we obtain
$$MSE (\hat \sigma^2_{ML}) = \left(\frac {n-K}{n}\right)^2\frac {2\sigma^4}{n-K} + \left(\frac {K}{n}\right)^2\sigma^4$$
$$\Rightarrow MSE (\hat \sigma^2_{ML}) = \frac {2(n-K)+K^2}{n^2}\sigma^4$$
We want the conditions (if they exist) under which
$$MSE (\hat \sigma^2_{ML}) > MSE (s^2) \Rightarrow \frac {2(n-K)+K^2}{n^2} > \frac {2}{n-K}$$
$$\Rightarrow 2(n-K)^2+K^2(n-K)> 2n^2$$
$$ 2n^2 -4nK + 2K^2 +nK^2 - K^3 > 2n^2 $$
Simplifying we obtain
$$ -4n + 2K +nK - K^2 > 0 \Rightarrow K^2 - (n+2)K + 4n < 0 $$
Is it feasible for this quadratic in $K$ to obtain negative values? We need its discriminant to be positive. We have
$$\Delta_K = (n+2)^2 -16n = n^2 + 4n + 4 - 16n = n^2 -12n + 4$$
which is another quadratic, in $n$ this time. This discriminant is
$$\Delta_n = 12^2 - 4^2 = 8\cdot 16$$
so
$$n_1,n_2 = \frac {12\pm \sqrt{8\cdot 16}}{2} = 6 \pm 4\sqrt2 \Rightarrow n_1,n_2 = \{1, 12\}$$
to take into account the fact that $n$ is an integer. If $n$ is inside this interval we have that $\Delta_K <0$ and the quadratic in $K$ takes always positive values, so we cannot obtain the required inequality. So: we need a sample size larger than 12.
Given this the roots for $K$-quadratic are
$$K_1, K_2 = \frac {(n+2)\pm \sqrt{n^2 -12n + 4}}{2} = \frac n2 +1 \pm \sqrt{\left(\frac n2\right)^2 +1 -3n}$$
Overall : for sample size $n>12$ and number of regressors $K$ such that $\lceil K_1\rceil <K<\lfloor K_2\rfloor $
we have
$$MSE (\hat \sigma^2_{ML}) > MSE (s^2)$$
For example, if $n=50$ then one finds that the number of regressors must be $5<K<47$ for the inequality to hold. It is interesting that for small numbers of regressors the MLE is better in MSE sense.
ADDENDUM
The equation for the roots of the $K$-quadratic can be written
$$K_1, K_2 = \left(\frac n2 +1\right) \pm \sqrt{\left(\frac n2 +1\right)^2 -4n}$$
which by a quick look I think implies that the lower root will always be $5$ (taking into account the "integer-value" restriction) -so MLE will be MSE-efficient when regressors are up to $5$ for any (finite) sample size.
The most common justification of the method of moments is simply the law of large numbers, which would seem to make your suggestion of estimating $\mu_3$ by $\hat{\mu}_3$ "method of moments" (and I'd be inclined to call it MoM in any case).
However, a number of books and documents, such as this for example (and to some extent the wikipedia page on method of moments) imply that you take the lowest $k$ moments* and estimate the required quantities for given the probability model from that, as you imply by estimating $\mu_3$ from the first two moments.
*(where you need to estimate $k$ parameters to obtain the required quantity)
--
Ultimately, I guess it comes down to "who defines what counts as method of moments?"
Do we look to Pearson? Do we look to the most common conventions? Do we accept any convenient definition? --- Any of those choices has problems, and benefits.
The interesting bit, to me, is whether one can always or almost always reparameterize a parametric family to characterize an estimation problem in EE as the solution to the moments of a (possibly bizarre) distribution function?
Clearly there are large classes of distribution for which method of moments would be useless.
For an obvious example, the mean of the Cauchy distribution is undefined.
Even when moments exist and are finite, there could be a large number of situations where the set of equations $f(\mathbf{\theta},\mathbf{y})=0$ has 0 solutions (think of some curve that never crosses the x-axis) or multiple solutions (one that crosses the axis repeatedly -- though multiple solutions aren't necessarily an insurmountable problem if you have a way to choose between them).
Of course, we also commonly see situations where a solution exists but doesn't lie in the parameter space (there may even be cases where there's never a solution in the parameter space, but I don't know of any -- it would be an interesting question to discover if some such cases exist).
I imagine there can be more complicated situations still, though I don't have any in mind at the moment.
Best Answer
A general answer is that an estimator based on a method of moments is not invariant by a bijective change of parameterisation, while a maximum likelihood estimator is invariant. Therefore, they almost never coincide. (Almost never across all possible transforms.)
Furthermore, as stated in the question, there are many MoM estimators. An infinity of them, actually. But they are all based on the empirical distribution, $\hat{F}$, which may be seen as a non-parametric MLE of $F$, although this does not relate to the question.
Actually, a more appropriate way to frame the question would be to ask when a moment estimator is sufficient, but this forces the distribution of the data to be from an exponential family, by the Pitman-Koopman lemma, a case when the answer is already known.