[Math] Method of Moments Pareto Distribution

estimationstatistics

Find a formula for the method of moments estimate for the parameter $\theta$ in the Pareto pdf,

$$f_Y(y;\theta) = \theta k^\theta\bigg(\frac{1}{y}\bigg)^{\theta+1}$$

Assume that $k$ is known and the data consists of random sample size n.

$$E[Y] = \int_{k}^{\infty}y\theta k^\theta\bigg(\frac{1}{y}\bigg)^{\theta+1}dy\\
= \theta k^\theta \int_{k}^{\infty}y\frac{1}{y}\bigg(\frac{1}{y}\bigg)^{\theta} dy \\
= \theta k^\theta \int_{k}^{\infty}y^{-\theta} dy \\
= \theta k^\theta \bigg[\frac{y^{-\theta + 1}}{-\theta+1}\bigg]\bigg\rvert_{k}^{\infty} \\
= \theta k^\theta \bigg[\frac{1}{y^{\theta-1}(1-\theta)}\bigg]\bigg\rvert_{k}^{\infty} \\
\theta k^\theta\bigg[0 – \frac{1}{k^{\theta-1}(1-\theta)}\bigg] \\
\frac{\theta k^\theta}{k^{\theta – 1}(1-\theta)} \\
E[Y] = \frac{\theta k}{1-\theta}$$

Solve for $\bar{y}$

$$\text{Let } \; E[Y] = \frac{1}{n} \sum_\limits{i=1}^{n}y_i \\
\frac{\theta k}{1-\theta} = \bar{y} \\
\theta k = \bar{y} – \bar{y}\theta \\
\theta k + \bar{y}\theta = \bar{y} \\
\theta (k+\bar{y}) = \bar{y} \\
\theta = \frac{\bar{y}}{k+\bar{y}}$$

Implies that $\hat{\theta} = \frac{\bar{y}}{k+\bar{y}}$

This is an even question and the book has no answer. Any improvements on this or is it wrong?
Also there is a "maximum-likelihood" tag but not a "method-of-moments" tag.
Can someone make one of these? Or does it have to go through a process to make a new tag?

Best Answer

Here are comments on estimation of the parameter $\theta$ of a Pareto distribution (with links to some formal proofs), also simulations to see if the method-of-moments provides a serviceable estimator.

Suppose $X_1, X_2, \dots, X_n$ is a random sample from the Pareto distribution with density function $f_X(x) = \theta\kappa^\theta/x^{\theta + 1},$ for $x > \kappa\; (0$ elsewhere, with $\kappa, \theta > 0.$ Then $E(X) = \theta\kappa/(\theta - 1),$ for $\theta > 1.$ This is an extremely right-skewed distribution with a sufficiently heavy tail that $E(X)$ does not exist for $\theta \le 1.$ [Below, we note that $X = e^Y,$ where $Y$ is already a right-skewed distribution with a heavy tail.]

We are interested in the case where $\kappa = 1$ is known. We wish to estimate $\theta.$ [If $\kappa$ were unknown, it could be estimated by $\hat \kappa = \min(X_i),$ but that is not relevant here.]

Method of moments estimator. Setting $E(X) = \theta/(\theta - 1) = \bar X,$ we find that the method of moments estimator of $\theta > 1$ to be $\check \theta = \bar X/(\bar X - 1).$ [See Watkins Notes.]

Maximum likelihood estimator. The maximum likelihood estimator for $\theta$ is $\hat\theta = n/\sum_i \ln(X_i).$ [See Wikipedia.]

Demonstration by simulation. In order to see how these estimator work in practice, we simulate $m = 10^6 $ Pareto samples of size $n = 20$. Because $X = U^{-U/\theta} =e^Y,$ where $U \sim \mathsf{Unif}(0,1),\,$ $Y \sim \mathsf{Exp}(\text{rate}=\theta),$ it is easy to simulate a Pareto sample in R. [See the Wikipedia page.] We use the exponential method because the R function rexp is already optimized for simulating the skewed exponential distribution.]

set.seed(1007)
th = 3;  n = 20;  m = 10^6
x = exp(rexp(m*n, th))
DTA = matrix(x, nrow=m)  # m x n matrix, each row a sample of size 20
a = rowMeans(DTA);  t = rowSums(log(DTA))
mme = a/(a-1);  mle = n/t

We see that both estimators are positively biased.

mean(mme);  mean(mle)
[1] 3.245709    # somewhat above parameter value 3
[1] 3.158528    # slightly above

In that case it is best to assess the precision of an estimator using root mean squared error. Mean squared error of an estimator $\hat \theta$ of parameter $\theta$ is $$E[(\hat \theta - \theta)^2] = Var(\hat \theta) + [b(\hat \theta)]^2,$$ were $b$ is the bias. We take its square root to get a quantity in the same units as the $X$'s. Here, as is often the case, the maximum likelihood estimator performs somewhat better than the method-of-moments estimator. [With a million iterations one can expect almost three place accuracy.]

sqrt(mean((mme-th)^2))
[1] 0.7950031
sqrt(mean((mle-th)^2))
[1] 0.7613678             ## MLE has smaller RMSE

In the figure below, the panels at left show a histogram of the 20 million $X$-values (truncated to eliminate about 0.5% of observations above 6), along with the Pareto PDF; and a histogram of the one million $\bar X$-values (truncated to eliminate about 0.1% of means above 3). Means of samples of size $n=20$ are distinctly non-normal. Orange vertical lines are at $\mu = E(X) = \theta / (\theta - 1) = 1.5.$

The histograms at right show sampling distributions (for $n=20)$ of MMEs and MLEs, respectively. MMEs are more seriously biased and have slightly greater dispersion from the target value $\theta = 3.$

enter image description here

Note: In case it is of use, here is the R code used to make the figure.

par(mfcol = c(2,2))  # enables four panels per plot
  hist(x[x < 6], prob=T, ylab="x", col="skyblue2",     main="PARETO(1,3)")
    abline(v = th/(th-1), col="orange2", lwd=2)
    curve(th/x^(th+1), add=T, col="blue", lwd=2)  # Pareto PDF
  hist(a[a < 3], prob=T, ylab="Mean", col="skyblue2", main="Sample Means")
    abline(v = th/(th-1), col="orange2", lwd=2)
  hist(mme, prob=T, ylab="MME", col="skyblue2", main="Method-of-Moments Estimates")
    abline(v = th, col="maroon", lwd=2)
  hist(mle, prob=T, ylab="MLE", col="skyblue2", main="Maximum Likelihood Estimates")
    abline(v = th, col="maroon", lwd=2)
par(mfcol=c(1,1))  # return to default single-panel plots
Related Question