This may be considered... cheating, but the OLS estimator is a MoM estimator. Consider a standard linear regression specification (with $K$ stochastic regressors, so magnitudes are conditional on the regressor matrix), and a sample of size $n$. Denote $s^2$ the OLS estimator of the variance $\sigma^2$ of the error term. It is unbiased so
$$ MSE(s^2) = \operatorname {Var}(s^2) = \frac {2\sigma^4}{n-K} $$
Consider now the MLE of $\sigma^2$. It is
$$\hat \sigma^2_{ML} = \frac {n-K}{n}s^2$$
Is it biased. Its MSE is
$$MSE (\hat \sigma^2_{ML}) = \operatorname {Var}(\hat \sigma^2_{ML}) + \Big[E(\hat \sigma^2_{ML})-\sigma^2\Big]^2$$
Expressing the MLE in terms of the OLS and using the expression for the OLS estimator variance we obtain
$$MSE (\hat \sigma^2_{ML}) = \left(\frac {n-K}{n}\right)^2\frac {2\sigma^4}{n-K} + \left(\frac {K}{n}\right)^2\sigma^4$$
$$\Rightarrow MSE (\hat \sigma^2_{ML}) = \frac {2(n-K)+K^2}{n^2}\sigma^4$$
We want the conditions (if they exist) under which
$$MSE (\hat \sigma^2_{ML}) > MSE (s^2) \Rightarrow \frac {2(n-K)+K^2}{n^2} > \frac {2}{n-K}$$
$$\Rightarrow 2(n-K)^2+K^2(n-K)> 2n^2$$
$$ 2n^2 -4nK + 2K^2 +nK^2 - K^3 > 2n^2 $$
Simplifying we obtain
$$ -4n + 2K +nK - K^2 > 0 \Rightarrow K^2 - (n+2)K + 4n < 0 $$
Is it feasible for this quadratic in $K$ to obtain negative values? We need its discriminant to be positive. We have
$$\Delta_K = (n+2)^2 -16n = n^2 + 4n + 4 - 16n = n^2 -12n + 4$$
which is another quadratic, in $n$ this time. This discriminant is
$$\Delta_n = 12^2 - 4^2 = 8\cdot 16$$
so
$$n_1,n_2 = \frac {12\pm \sqrt{8\cdot 16}}{2} = 6 \pm 4\sqrt2 \Rightarrow n_1,n_2 = \{1, 12\}$$
to take into account the fact that $n$ is an integer. If $n$ is inside this interval we have that $\Delta_K <0$ and the quadratic in $K$ takes always positive values, so we cannot obtain the required inequality. So: we need a sample size larger than 12.
Given this the roots for $K$-quadratic are
$$K_1, K_2 = \frac {(n+2)\pm \sqrt{n^2 -12n + 4}}{2} = \frac n2 +1 \pm \sqrt{\left(\frac n2\right)^2 +1 -3n}$$
Overall : for sample size $n>12$ and number of regressors $K$ such that $\lceil K_1\rceil <K<\lfloor K_2\rfloor $
we have
$$MSE (\hat \sigma^2_{ML}) > MSE (s^2)$$
For example, if $n=50$ then one finds that the number of regressors must be $5<K<47$ for the inequality to hold. It is interesting that for small numbers of regressors the MLE is better in MSE sense.
ADDENDUM
The equation for the roots of the $K$-quadratic can be written
$$K_1, K_2 = \left(\frac n2 +1\right) \pm \sqrt{\left(\frac n2 +1\right)^2 -4n}$$
which by a quick look I think implies that the lower root will always be $5$ (taking into account the "integer-value" restriction) -so MLE will be MSE-efficient when regressors are up to $5$ for any (finite) sample size.
A sample consisting of $n$ realizations from identically and independently distributed random variables is ergodic. In a such a case, "sample moments" are consistent estimators of theoretical moments of the common distribution, if the theoretical moments exist and are finite.
This means that
$$\hat \mu_k(n) = \mu_k(\theta) + e_k(n), \;\;\; e_k(n) \xrightarrow{p} 0 \tag{1}$$
So by equating the theoretical moment with the corresponding sample moment we have
$$\hat \mu_k(n) = \mu_k(\theta) \Rightarrow \hat \theta(n) = \mu_k^{-1}(\hat \mu_k(n)) = \mu_k^{-1}[\mu_k(\theta) + e_k(n)]$$
So ($\mu_k$ does not depend on $n$)
$$\text{plim} \hat \theta(n) = \text{plim}\big[\mu_k^{-1}(\mu_k(\theta) + e_k)\big] = \mu_k^{-1}\big(\mu_k(\theta) + \text{plim}e_k(n)\big)$$
$$=\mu_k^{-1}\big(\mu_k(\theta) + 0\big) = \mu_k^{-1}\mu_k(\theta) = \theta$$
So we do that because we obtain consistent estimators for the unknown parameters.
Best Answer
There is a nice article about this on Wikipedia.
https://en.m.wikipedia.org/wiki/Method_of_moments_(statistics)
It means that you are estimating the population parameters by selecting the parameters such that the population distribution has the moments that are equivalent to the observed moments in the sample.
The maximum likelihood estimate minimizes the likelihood function. In some cases this minimum can sometimes be expressed in terms of setting the population parameters equal to the sample parameters.
E.g. when estimating the mean parameter of a distribution and employ MLE then often we end up with using $\mu = \bar{x} $. However this does not need to be always the case ( related: https://stats.stackexchange.com/a/317631/164061 although in the case of the example there, the Poisson distribution, the MLE and MoM estimate coincide, and the same is true for many others). For example the MLE solution for the estimate of $\mu $ in a log normal distribution is:
$$\mu = 1/n \sum ln (x_i) = \overline {ln (x)}$$
Whereas the MoM solution is solving
$$exp (\mu + \frac {1}{2}\sigma^2) = \bar {x}$$ leading to $$\mu = ln (\bar {x}) - \frac {1}{2} \sigma^2$$
So the MoM is a practical way to estimate the parameters, leading often to the exact same result as the MLE (since the moments of the sample often coincide with the moments of the population, e.g. a sample mean is distributed around the population mean, and up to some factor/bias, it works out very well). The MLE has a stronger theoretical foundation and for instance allows estimation of errors using the Fisher matrix (or estimates of it), and it is a much more natural approach in the case of regression problems (I haven't tried it but I guess that a MoM for solving parameters in a simple linear regression is not working easily and may give bad results. In the answer by superpronker it seems like this is done by some minimization of a function. For MLE this minimization expresses higher probability, but I wonder whether it represents such a similar thing for MoM).