Knowing the sum, the n(), and the bound parameters of a truncated-Pareto distributed variable, how I identify the alpha (shape) parameter

pareto-distributiontruncated-distributions

I know that there would be a fancy command on R to do the estimation of $\alpha$ given the inputs, but I am also curious about the relationship between $\alpha$ to $\mathbb{E}(\sum (x))$, given $n(x)$, $min(x)$ and $max(x)$.

Also, given that observed $\hat{max(x)} \leq max(x)$, is there a way to estimate $\alpha$ without bias? I know that in applied science, Pareto estimation is actually very hard, but I don't know about truncated Pareto.

I accept suggestion on the best way to estimate it in R. I know truncated/bounded Paretos are in VGAM, but I don't know the commands to model Pareto distribution. Is it correct to use the package fitdistrplus?

Best Answer

First, it's well known that $\sum_{i=1}^n \log x_i$ is complete & minimal sufficient for the shape parameter $\alpha$ of an untruncated Pareto distribution—& truncation doesn't affect the completeness or sufficiency of a statistic (Smith, 1957). Consequently (1) you ought to be basing an estimator on the sum of logged observations (or the sample geometric mean, or an equivalent), & (2) you'll be lucky to find an estimator based on the sum of observations (or the sample arithmetic mean, or an equivalent), either derived in papers or textbooks, or implemented in software libraries.

If you do have the full sample, or $\sum_{i=1}^n \log x_i$, Aban et al. (2006) derive the score function for $\alpha$, which you can find the root of to give its maximum-likelihood estimate (for the cases of both known & unknown bounds), & show that its distribution is asymptotically Gaussian (you can get standard error estimates from the gradient at the root). (They also cite a paper deriving the uniformly minimum-variance unbiased estimator.)

If for some reason all you have to work with is $\sum_{i=1}^n x_i$, then you could try a method-of-moments approach to estimating $\alpha$. Note the arithmetic mean exists for all $\alpha>0$, unlike the case of an untruncated Pareto distribution—see Do all bounded probability distributions have a definite mean?. Given a Pareto distribution with density

$$ f(x)= \frac{\alpha \beta^\alpha}{x^{\alpha+1}} $$

its expectation when truncated above at $\tau$ is

$$ \begin{align} \operatorname{E} X | X < \tau &= \frac{\int^\tau x f(x)\ \mathrm{d} x}{\int^\tau f(x) \ \operatorname{d} x} \\ &= \frac{\int_\beta^\tau x^{-\alpha}\ \mathrm{d}x} {\int_\beta^\tau x^{-(\alpha+1)}\ \mathrm{d}x}\\ &= \frac{\alpha}{1-\alpha}\cdot\frac{[x^{1-\alpha}]^\tau_\beta}{[x^\alpha]^\tau_\beta}\\ &= \frac{\alpha}{1-\alpha}\cdot\frac{\tau^{1-\alpha} - \beta^{1-\alpha}}{\tau^{-\alpha}-\beta^{-\alpha}}\\ &= \frac{\alpha\beta}{\alpha-1}\cdot\frac{1 - \left(\frac{\beta}{\tau}\right)^{\alpha-1}}{1-\left(\frac{\beta}{\tau}\right)^{\alpha}}\\ \end{align} $$

So plug in the sample mean $\bar x$ for $\operatorname{E} X$, & $\hat\alpha$ for $\alpha$, & find the root for $\hat\alpha$ numerically. (You could bootstrap to get confidence interval/standard error/bias estimates.)


# function to generate Pareto variates
rpareto <- function(n, alpha, beta) beta/(1-runif(n))^(1/alpha)

# function to calculate expectation
EX <- Vectorize(function(alpha, beta, tau, epsilon = 1e-7){
  if (abs(alpha-1) > epsilon){
    (alpha*beta/(alpha-1)) * (1 - (beta/tau)^(alpha-1)) / (1 - (beta/tau)^alpha)
  }
  else {
    alpha*beta*log(tau/beta) / (1 - (beta/tau)^alpha)
  }
})

# function that you'll find the root of
f <- function(alpha, x.bar, beta, tau, epsilon = 1e-7) {
  x.bar - EX(alpha, beta, tau, epsilon = 1e-7)
}

alpha_0 <- 2 # true value of alpha
beta <- 3
tau <- 9

# make up some data
set.seed(1234)
x <- rpareto(200, alpha_0, beta)
x <- x[x < tau]
length(x)
plot(ecdf(x), verticals=TRUE, pch="")

# find sample mean
(x.bar <- mean(x))

# find roughly where alpha.hat lies
alpha <- seq(1, 3, by=0.1)
plot(alpha, EX(alpha, beta, tau), type="l", ylab="EX")
abline(h = x.bar, lty=2)

# calculate alpha.hat
(alpha.hat <- uniroot(f, x.bar=x.bar, beta = beta, tau = tau, lower = 2, upper = 3)$root)

† With regard to the question in your 2nd paragraph: as you know the bound parameters, the sample minimum & maximum don't enter into the estimation of $\alpha$. If you didn't know the bound parameters, the sample minimum & maximum are their M.L.E.s., & you can substitute them for the known bound parameters in the score equation for $\alpha$ (what changes is the standard errors you'd calculate for the M.L.E. of $\alpha$).


‡ Note $$\lim_{\alpha \rightarrow 1} \frac{1-\left(\frac{\beta}{\tau}\right)^{\alpha-1}}{\alpha-1} = \log \frac{\tau}{\beta}$$

for when $\alpha \approx 1$ in numerical calculations.


Aban et al. (2006), "Parameter Estimation for the Truncated Pareto Distribution", J. Am. Stat. Assoc., 101, 473

Smith (1957), "A Note on Truncation and Sufficient Statistics", Ann. Math. Statist., 28, 1

Related Question