As the case when $\xi = 0$ simply corresponds to an exponential distribution with scale parameter $\beta$, it is trivial to compute the method of moments estimator given $\overline y$, the first sample raw moment (the second is not needed since there is only one parameter to estimate in this case). For the case $\xi < 1/2$ with $\xi \ne 0$, we can easily calculate $${\rm E}[Y] = \frac{\beta}{1-\xi}, \quad {\rm E}[Y^2] = \frac{2\beta^2}{(1-\xi)(1-2\xi)},$$ which shows that the first and second raw moments of $Y$ are defined only if $\xi < 1/2$. Consequently, setting these to their respective sample moments and solving for the parameters easily yields the closed form solution $$\widehat{\beta} = \frac{\overline y \overline{y^2}}{2(\overline{y^2} - (\overline y)^2)}, \quad \widehat{\xi} = \frac{1}{2} - \frac{(\overline y)^2}{2(\overline{y^2} - (\overline y)^2)},$$ where $\overline{y^2} = \frac{1}{n} \sum_{i=1}^n y_i^2$ is the second sample raw moment.
(I am going to fully answer this home study, because it appears that the OP is rather away from any path that could be steered in the right direction by simple hints).
It can be shown that the MLE for $\theta$ is the minimum-order statistic,
$$\hat \theta_n = X_{1:n}$$
In fact, one can derive that $\hat \theta_n$ follows itself a Pareto distribution with scale parameter $\theta$ and (since here the original shape parameter is $1$), with shape parameter $n$, i.e. with distribution function
$$F(\hat \theta_n) = 1- \left(\frac {\hat \theta_n}{\theta}\right)^{-n}$$
So in this case we have available the finite sample distribution of the ML estimator.
Still, the OP is asked to consider asymptotics for the function $\sqrt {n} (\hat \theta _n - \theta)$.
Subtracting $\theta$ does not make the distribution centered at zero or anything like that, since we always have
$$\hat \theta_n = X_{1:n} \geq \min {X}= \theta$$
By this observation alone we know that even if $\sqrt {n} (\hat \theta _n - \theta)$ has a limiting distribution, it won't be the normal.
Now, $(\hat \theta_n - \theta)$ has an exact Lomax distribution, with the same parameters as the distribution of $\hat \theta_n$. The Variance of the Lomax distribution is the same as the variance of the Pareto distribution so in our case
$$\text{Var}[(\hat \theta_n - \theta)] = \frac {\theta^2 n}{(n-1)^2(n-2)}$$
Note that the denominator has leading term $n^3$. So to stabilize this variance as $n$ goes to infinity, so that it neither explodes nor goes to zero, one needs to multiply the variable by $n$, not $\sqrt {n}$.
$$\text{Var}[n(\hat \theta_n - \theta)] = \frac {\theta^2 n^3}{(n-1)^2(n-2)} \to \theta^2$$
So $\sqrt {n} (\hat \theta _n - \theta)$ goes to the constant zero (convergence of the estimator is faster than usual and multiplying by $\sqrt {n}$ "is not enough" to maintain a distribution, we need to multiply by $n$). Standardized by its variance (which goes to zero), it explodes.
Can we obtain the limiting distribution of $Z_n = n(\hat \theta_n - \theta)$?
We have for the distribution function
$$F_{Z_n}(z) = \text{Prob}(Z_n \leq z) = \text{Prob}(n(\hat \theta_n - \theta) \leq z) = \text{Prob}\left(\hat \theta_n \leq \theta + \frac {z}{n} \right)$$
$$ \implies F_{Z_n}(z) = 1- \left(\frac {\theta + \frac {z}{n}}{\theta}\right)^{-n} = 1- \left(1+ \frac {(z/\theta)}{n}\right)^{-n}$$
$$\implies \lim_{n \to \infty}F_{Z_n}(z) = 1 - \exp\{-z/\theta\}$$
which is the CDF of the Exponential distribution, with expected value $\theta$ and variance $\theta^2$.
Best Answer
The distribution of a sum of Pareto variates is not especially simple, but has been done. [1] [2]
Without loss of generality, we can take $m=1$; we can simply divide through by $m$ to work with $X^*=X/m$ and the lower limit for $X^*$ is then $1$. Since $m$ is then just a scale factor applied to the data we can translate any results back to the original data scale.
In this answer I haven't attempted to compute the exact bias from those published results. If it were me faced with this exercise I probably would focus first on using simulation to obtain a clear understanding how the bias relates to the $a$ parameter and the sample size (though I think we can say something about how it should work as a function of sample size).
However, we can do the last part easily enough.
$\bar{x}$ will be unbiased for $E(X) = \frac{a}{a-1}$, but we have
$$\hat{a}=\frac{\bar{x}}{\bar{x}-1}=\frac{1}{{1-\frac{1}{\bar{x}}}}$$
Taking $Y=\bar{X}$, we can show that $\varphi(Y)=\frac{1}{{1-\frac{1}{Y}}}$ is convex.
From Jensen's inequality, we can then show that the estimator is biased and in which direction. Jensen's inequality says that for $\varphi$ convex:
$$\varphi \left(\mathbb {E} [Y]\right)\leq \mathbb {E} \left[\varphi (Y)\right]$$
(The conditions under which equality will hold don't apply here; the inequality will be strict.)
This shows that the estimator will be too high on average.
Here's some results of a simulation with $a=2$
(10000 samples each with n=100, so here we have 10000 $\hat{a}$ values. The blue line is the mean estimate from those 10000 samples.)
This simulation doesn't prove anything, but it shows that we don't contradict the derived result; looks like there was no error in concluding the bias was upward.
Such simulations allow us to see how the bias changes with $a$ -- by doing the same thing across a variety of $a$ values -- and with sample size (again, by using different $n$).
If this is not for a class exercise, I'm very curious why you wouldn't use maximum likelihood in this case:
it's very simple - for $m=1$ it's the reciprocal of the mean of the logs; if $m$ is not 1, you subtract $log(m)$ from the mean of the logs before taking reciprocals.
For this parameterization, it's not unbiased either, but it makes better use of the data, and it will have lower variance (and lower bias, by the look of some simulations).
The bias is also easy to compute!
[1] Blum, M. (1970),
"On the Sums of Independently Distributed Pareto Variates"
SIAM Journal on Applied Mathematics, 19:1 (Jul.), pp. 191-198
[2] Ramsay, Colin M. (2008)
"The Distribution of Sums of I.I.D. Pareto Random Variables with Arbitrary Shape Parameter"
Communications in Statistics - Theory and Methods, 37:14, pp 2177-2184