The wikipedia page for the Generalized Pareto is the same distribution as yours - you have $\mu=0$ and $\sigma=\beta$.
That page gives the mean and variance, and also gives the skewness and excess kurtosis, from which you can back out the quantities you need:
1) $E(Y)= \frac{\beta}{1-\xi}\, \; (\xi < 1)$
2) $\text{Var}(Y)=\frac{\beta^2}{(1-\xi)^2(1-2\xi)}\, \; (\xi < 1/2)$
$\quad\quad\quad\quad\;=E(Y)^2\frac{1}{(1-2\xi)}\, \; (\xi < 1/2)$
3) $\text{skewness}(Y) = \mu_3/\text{Var(Y)}^{3/2} = \frac{2(1+\xi)\sqrt(1-{2\xi})}{(1-3\xi)}\,\;(\xi<1/3)$
Hence $\mu_3 = \frac{\beta^3}{[(1-\xi)^2(1-2\xi)]^{3/2}} \frac{2(1+\xi)\sqrt(1-{2\xi})}{(1-3\xi)}\,\;(\xi<1/3)$
$\quad\quad\quad\quad= \frac{\beta^3}{(1-\xi)^3} \frac{2(1+\xi)}{(1-2\xi)(1-3\xi)}\,\;(\xi<1/3)$
$\quad\quad\quad\quad= E(Y)^3 \frac{2(1+\xi)}{(1-2\xi)(1-3\xi)}\,\;(\xi<1/3)$
4) Excess kurtosis = $\frac{3(1-2\xi)(2\xi^2+\xi+3)}{(1-3\xi)(1-4\xi)}-3\,\;(\xi<1/4)$
Hence $\text{kurtosis}(Y) = \mu_4/\text{Var(Y)}^{2} = \frac{3(1-2\xi)(2\xi^2+\xi+3)}{(1-3\xi)(1-4\xi)}\,\;(\xi<1/4)$
so $\mu_4 = \frac{\beta^4}{[(1-\xi)^2(1-2\xi)]^2}\frac{3(1-2\xi)(2\xi^2+\xi+3)}{(1-3\xi)(1-4\xi)}\,\;(\xi<1/4)$
$\quad\quad\;\, = E(Y)^4\frac{3(2\xi^2+\xi+3)}{(1-2\xi)(1-3\xi)(1-4\xi)}\,\;(\xi<1/4)$
If you want the raw moments rather than the central moments, the raw moments can readily be obtained from them.
The most common justification of the method of moments is simply the law of large numbers, which would seem to make your suggestion of estimating $\mu_3$ by $\hat{\mu}_3$ "method of moments" (and I'd be inclined to call it MoM in any case).
However, a number of books and documents, such as this for example (and to some extent the wikipedia page on method of moments) imply that you take the lowest $k$ moments* and estimate the required quantities for given the probability model from that, as you imply by estimating $\mu_3$ from the first two moments.
*(where you need to estimate $k$ parameters to obtain the required quantity)
--
Ultimately, I guess it comes down to "who defines what counts as method of moments?"
Do we look to Pearson? Do we look to the most common conventions? Do we accept any convenient definition? --- Any of those choices has problems, and benefits.
The interesting bit, to me, is whether one can always or almost always reparameterize a parametric family to characterize an estimation problem in EE as the solution to the moments of a (possibly bizarre) distribution function?
Clearly there are large classes of distribution for which method of moments would be useless.
For an obvious example, the mean of the Cauchy distribution is undefined.
Even when moments exist and are finite, there could be a large number of situations where the set of equations $f(\mathbf{\theta},\mathbf{y})=0$ has 0 solutions (think of some curve that never crosses the x-axis) or multiple solutions (one that crosses the axis repeatedly -- though multiple solutions aren't necessarily an insurmountable problem if you have a way to choose between them).
Of course, we also commonly see situations where a solution exists but doesn't lie in the parameter space (there may even be cases where there's never a solution in the parameter space, but I don't know of any -- it would be an interesting question to discover if some such cases exist).
I imagine there can be more complicated situations still, though I don't have any in mind at the moment.
Best Answer
As the case when $\xi = 0$ simply corresponds to an exponential distribution with scale parameter $\beta$, it is trivial to compute the method of moments estimator given $\overline y$, the first sample raw moment (the second is not needed since there is only one parameter to estimate in this case). For the case $\xi < 1/2$ with $\xi \ne 0$, we can easily calculate $${\rm E}[Y] = \frac{\beta}{1-\xi}, \quad {\rm E}[Y^2] = \frac{2\beta^2}{(1-\xi)(1-2\xi)},$$ which shows that the first and second raw moments of $Y$ are defined only if $\xi < 1/2$. Consequently, setting these to their respective sample moments and solving for the parameters easily yields the closed form solution $$\widehat{\beta} = \frac{\overline y \overline{y^2}}{2(\overline{y^2} - (\overline y)^2)}, \quad \widehat{\xi} = \frac{1}{2} - \frac{(\overline y)^2}{2(\overline{y^2} - (\overline y)^2)},$$ where $\overline{y^2} = \frac{1}{n} \sum_{i=1}^n y_i^2$ is the second sample raw moment.