The method of moments estimator is $\hat \theta_n = 2\bar X_n,$ and it is unbiased.
It has a finite variance (which decreases with increasing $n$)
and so it is also consistent; that is, it converges in probability to $\theta.$
I have not checked your proof of consistency, which seems inelegant and incorrect (for one thing, the $\epsilon$ disappears in the second line).
You should be able to use a straightforward application of Chebyshev's inequality to show that
$\lim_{n \rightarrow \infty}P(|\hat \theta_n - \theta| <\epsilon) = 1.$
However, $\hat \theta_n$ does not have the minimum variance among unbiased
estimators. The maximum likelihood estimator is the maximum of the $n$
values $X_i$ (often denoted $X_{(n)}).$ The estimator $T = cX_{(n)},$ where $c$ is constant depending on $n,$ is unbiased and has minimum variance among
unbiased estimators (UMVUE).
Both estimators are illustrated below for $n = 10$ and $\theta = 5$ by
simulations in R statistical software. With a 100,000 iterations
means and variances should be accurate to about two places. They
are not difficult to find analytically.
m = 10^5; n = 10; th = 5
x = runif(m*n, 0, th)
DTA = matrix(x, nrow=m) # m x n matrix, each row a sample of 10
a = rowMeans(DTA) # vector of m sample means
w = apply(DTA, 1, max) # vector of m maximums
MM = 2*a; UMVUE = ((n+1)/n)*w
mean(MM); var(MM)
## 5.003658 # consistent with unbiasedness of MM
## 0.8341769 # relatively large variance
mean(UMVUE); var(UMVUE)
## 5.002337 # consistent with unbiasedness of UMVUE
## 0.207824 # relatively small variance
The histograms below illustrate the larger variance of the method of
moments estimator.
Your calculation of $\hat \theta_2$ is still not right. You should have $$\mathcal L(\theta \mid \boldsymbol x) = 2^n \theta^{-n} \left
( \prod_{i=1}^n x_i \right) \exp \left( -\frac{1}{\theta} \sum_{i=1}^n x_i^2 \right) \propto \theta^{-n} \exp \left( - \frac{n \overline{x^2}}{\theta} \right),$$ thus your log-likelihood is $$\ell (\theta \mid \boldsymbol x) = -n \log \theta - \frac{n \overline{x^2}}{\theta},$$ and locating the critical points gives $$0 = \frac{\partial \ell}{\partial \theta} = -\frac{n}{\theta} + \frac{n \overline{x^2}}{\theta^2} = n \left( \frac{\overline{x^2} - \theta}{\theta^2} \right),$$ hence $\hat \theta_2 = \overline{x^2} = \frac{1}{n} \sum_{i=1}^n x_i^2.$ There is no additional factor of $2$. Notice how I remove all factors of $\mathcal L$ that are not functions of $\theta$, which simplifies all subsequent calculations (and avoids the computational error you made with the additional factor of $2$).
To compute the bias of these estimators, it suffices to use the basic properties: First, observe $$\operatorname{E}[X] = \frac{\sqrt{\pi \theta}}{2}$$ as you wrote. You may verify that $$\operatorname{E}[X^2] = \theta.$$ Now we see that $$\operatorname{E}[\hat\theta_2] = \operatorname{E}\left[\frac{1}{n} \sum_{i=1}^n X_i^2\right] = \frac{1}{n} \sum_{i=1}^n \operatorname{E}[X_i^2] = \frac{1}{n} \cdot n \theta = \theta,$$ so $\hat\theta_2$ is unbiased. This is the most immediately obvious calculation which is why we started with it. As for $\hat \theta_1$, we must be careful to write $$\operatorname{E}[\hat\theta_1] = \frac{4}{\pi} \operatorname{E}\,\left[\left(\frac{1}{n} \sum_{i=1}^n X_i\right)^2\right] = \frac{4}{\pi n^2} \sum_{i=1}^n \sum_{j=1}^n \operatorname{E}[X_i X_j].$$ Note we have not yet used independence of the sample, only the linearity of expectation. When $i \ne j$, $X_i$ and $X_j$ are independent, and $$\operatorname{E}[X_i X_j] = \operatorname{E}[X_i]\operatorname{E}[X_j] = \frac{\pi}{4}\theta.$$ But when $i = j$, this is not the case and we have $$\operatorname{E}[X_i X_j] = \operatorname{E}[X_i^2] = \theta.$$ Since the first case occurs $n(n-1)$ times in the double sum, and the second occurs $n$ times, we get $$\operatorname{E}[\hat\theta_1] = \frac{4}{\pi n^2} \left( n(n-1) \frac{\pi}{4}\theta + n \theta \right) =\left( 1 + \frac{4 - \pi}{\pi n} \right) \theta .$$ This of course proves $\hat\theta_1$ is biased. Is it asymptotically biased or unbiased?
As for the consistency of the estimator, you must show using similar methods that the variance decreases with increasing sample size $n$. To do this, you must compute $\operatorname{E}[\hat\theta_1^2]$ and $\operatorname{E}[\hat\theta_2^2]$. This is left as an exercise.
The second moment of $\hat \theta_1$ is $$\operatorname{E}[\hat\theta_1^2] = \frac{4^2}{\pi^2} \operatorname{E}\,\left[\left(\frac{1}{n}\sum_{i=1}^n X_i\right)^4\right].$$ Use the same technique as for the first moment: $$\left(\sum_{i=1}^n X_i\right)^4 = \sum_{g=1}^n \sum_{h=1}^n \sum_{i=1}^n \sum_{j=1}^n X_g X_h X_i X_j.$$ How many of these summands correspond to all distinct indices? How many correspond to exactly two equal? How many correspond to two equal pairs? How many correspond to exactly three equal? How many correspond to all four equal? What is the expectation of a general term in each case?
Best Answer
By Jensen inequality with convex function $\varphi(x)=\frac1x$ ($x>1$) we have $$ \mathbb E[\varphi(\bar x)]=\mathbb E\left[\frac{1}{\bar x}\right] > \varphi(\mathbb E[\bar x])=\varphi(\mathbb E[X_1])=\frac{1}{\mathbb E[X_1]}=1-\theta. $$ The inequality is strict since $\bar x$ is not a degenerate random variable and the function $\varphi$ is non-linear on $(1,\infty)$.
Then $$\mathbb E[\hat \theta] = 1- \mathbb E\left[\frac{1}{\bar x}\right] < 1-(1-\theta) = \theta.$$
This inequality means that the estimate is biased.
To prove consistency, use Khintchine's Law of large numbers.