Find a formula for the method of moments estimate for the parameter $\theta$ in the Pareto pdf,
$$f_Y(y;\theta) = \theta k^\theta\bigg(\frac{1}{y}\bigg)^{\theta+1}$$
Assume that $k$ is known and the data consists of random sample size n.
$$E[Y] = \int_{k}^{\infty}y\theta k^\theta\bigg(\frac{1}{y}\bigg)^{\theta+1}dy\\
= \theta k^\theta \int_{k}^{\infty}y\frac{1}{y}\bigg(\frac{1}{y}\bigg)^{\theta} dy \\
= \theta k^\theta \int_{k}^{\infty}y^{-\theta} dy \\
= \theta k^\theta \bigg[\frac{y^{-\theta + 1}}{-\theta+1}\bigg]\bigg\rvert_{k}^{\infty} \\
= \theta k^\theta \bigg[\frac{1}{y^{\theta-1}(1-\theta)}\bigg]\bigg\rvert_{k}^{\infty} \\
\theta k^\theta\bigg[0 – \frac{1}{k^{\theta-1}(1-\theta)}\bigg] \\
\frac{\theta k^\theta}{k^{\theta – 1}(1-\theta)} \\
E[Y] = \frac{\theta k}{1-\theta}$$
Solve for $\bar{y}$
$$\text{Let } \; E[Y] = \frac{1}{n} \sum_\limits{i=1}^{n}y_i \\
\frac{\theta k}{1-\theta} = \bar{y} \\
\theta k = \bar{y} – \bar{y}\theta \\
\theta k + \bar{y}\theta = \bar{y} \\
\theta (k+\bar{y}) = \bar{y} \\
\theta = \frac{\bar{y}}{k+\bar{y}}$$
Implies that $\hat{\theta} = \frac{\bar{y}}{k+\bar{y}}$
This is an even question and the book has no answer. Any improvements on this or is it wrong?
Also there is a "maximum-likelihood" tag but not a "method-of-moments" tag.
Can someone make one of these? Or does it have to go through a process to make a new tag?
Best Answer
Here are comments on estimation of the parameter $\theta$ of a Pareto distribution (with links to some formal proofs), also simulations to see if the method-of-moments provides a serviceable estimator.
Suppose $X_1, X_2, \dots, X_n$ is a random sample from the Pareto distribution with density function $f_X(x) = \theta\kappa^\theta/x^{\theta + 1},$ for $x > \kappa\; (0$ elsewhere, with $\kappa, \theta > 0.$ Then $E(X) = \theta\kappa/(\theta - 1),$ for $\theta > 1.$ This is an extremely right-skewed distribution with a sufficiently heavy tail that $E(X)$ does not exist for $\theta \le 1.$ [Below, we note that $X = e^Y,$ where $Y$ is already a right-skewed distribution with a heavy tail.]
We are interested in the case where $\kappa = 1$ is known. We wish to estimate $\theta.$ [If $\kappa$ were unknown, it could be estimated by $\hat \kappa = \min(X_i),$ but that is not relevant here.]
Method of moments estimator. Setting $E(X) = \theta/(\theta - 1) = \bar X,$ we find that the method of moments estimator of $\theta > 1$ to be $\check \theta = \bar X/(\bar X - 1).$ [See Watkins Notes.]
Maximum likelihood estimator. The maximum likelihood estimator for $\theta$ is $\hat\theta = n/\sum_i \ln(X_i).$ [See Wikipedia.]
Demonstration by simulation. In order to see how these estimator work in practice, we simulate $m = 10^6 $ Pareto samples of size $n = 20$. Because $X = U^{-U/\theta} =e^Y,$ where $U \sim \mathsf{Unif}(0,1),\,$ $Y \sim \mathsf{Exp}(\text{rate}=\theta),$ it is easy to simulate a Pareto sample in R. [See the Wikipedia page.] We use the exponential method because the R function
rexp
is already optimized for simulating the skewed exponential distribution.]We see that both estimators are positively biased.
In that case it is best to assess the precision of an estimator using root mean squared error. Mean squared error of an estimator $\hat \theta$ of parameter $\theta$ is $$E[(\hat \theta - \theta)^2] = Var(\hat \theta) + [b(\hat \theta)]^2,$$ were $b$ is the bias. We take its square root to get a quantity in the same units as the $X$'s. Here, as is often the case, the maximum likelihood estimator performs somewhat better than the method-of-moments estimator. [With a million iterations one can expect almost three place accuracy.]
In the figure below, the panels at left show a histogram of the 20 million $X$-values (truncated to eliminate about 0.5% of observations above 6), along with the Pareto PDF; and a histogram of the one million $\bar X$-values (truncated to eliminate about 0.1% of means above 3). Means of samples of size $n=20$ are distinctly non-normal. Orange vertical lines are at $\mu = E(X) = \theta / (\theta - 1) = 1.5.$
The histograms at right show sampling distributions (for $n=20)$ of MMEs and MLEs, respectively. MMEs are more seriously biased and have slightly greater dispersion from the target value $\theta = 3.$
Note: In case it is of use, here is the R code used to make the figure.