Solved – Bias of method of moments estimator for Pareto distribution with known scale parameter

method of momentspareto-distributionunbiased-estimator

Let $x$ be a Pareto distribution with a known scale parameter $m>0$, i.e. $x\sim f(x|a)=\frac{am^a}{x^{a+1}}, x>a, a>0$

$\mathrm{E}\left[X\right]=\frac{am}{a-1}$

Using method of moments estimator for the shape parameter,
$\frac{\hat{a}m}{\hat{a}-1}=\sum_{i=1}^{n}{\frac{x_i}{n}}, \hat{a}=\frac{\sum{x_i}}{\sum{x_i}-mn}$

How does one calculate the bias of the estimator? What's $\mathrm{E}\left[\hat{a}\right]$? Is there a known distribution for sum of Pareto variables?

If there's no closed form expression for the expectation, is the bias always one way?

Best Answer

The distribution of a sum of Pareto variates is not especially simple, but has been done. [1] [2]

Without loss of generality, we can take $m=1$; we can simply divide through by $m$ to work with $X^*=X/m$ and the lower limit for $X^*$ is then $1$. Since $m$ is then just a scale factor applied to the data we can translate any results back to the original data scale.

In this answer I haven't attempted to compute the exact bias from those published results. If it were me faced with this exercise I probably would focus first on using simulation to obtain a clear understanding how the bias relates to the $a$ parameter and the sample size (though I think we can say something about how it should work as a function of sample size).

However, we can do the last part easily enough.

$\bar{x}$ will be unbiased for $E(X) = \frac{a}{a-1}$, but we have

$$\hat{a}=\frac{\bar{x}}{\bar{x}-1}=\frac{1}{{1-\frac{1}{\bar{x}}}}$$

Taking $Y=\bar{X}$, we can show that $\varphi(Y)=\frac{1}{{1-\frac{1}{Y}}}$ is convex.

From Jensen's inequality, we can then show that the estimator is biased and in which direction. Jensen's inequality says that for $\varphi$ convex:

$$\varphi \left(\mathbb {E} [Y]\right)\leq \mathbb {E} \left[\varphi (Y)\right]$$

(The conditions under which equality will hold don't apply here; the inequality will be strict.)

This shows that the estimator will be too high on average.

Here's some results of a simulation with $a=2$

enter image description here

(10000 samples each with n=100, so here we have 10000 $\hat{a}$ values. The blue line is the mean estimate from those 10000 samples.)

This simulation doesn't prove anything, but it shows that we don't contradict the derived result; looks like there was no error in concluding the bias was upward.

Such simulations allow us to see how the bias changes with $a$ -- by doing the same thing across a variety of $a$ values -- and with sample size (again, by using different $n$).


If this is not for a class exercise, I'm very curious why you wouldn't use maximum likelihood in this case:

  • it's very simple - for $m=1$ it's the reciprocal of the mean of the logs; if $m$ is not 1, you subtract $log(m)$ from the mean of the logs before taking reciprocals.

  • For this parameterization, it's not unbiased either, but it makes better use of the data, and it will have lower variance (and lower bias, by the look of some simulations).

  • The bias is also easy to compute!


[1] Blum, M. (1970),
"On the Sums of Independently Distributed Pareto Variates"
SIAM Journal on Applied Mathematics, 19:1 (Jul.), pp. 191-198

[2] Ramsay, Colin M. (2008)
"The Distribution of Sums of I.I.D. Pareto Random Variables with Arbitrary Shape Parameter"
Communications in Statistics - Theory and Methods, 37:14, pp 2177-2184

Related Question