Solved – Have MLE estimators for Generalized Pareto Distribution. Given a known value of $c$, how to calculate $a$ and $b$ using the provided estimators

distributionsextreme valuefinancemaximum likelihoodpareto-distribution

I am doing research into the three parameter Generalized Pareto Distribution
$$
f(x|a,b,c) = \frac 1 b\left(1+a\left(\frac{x-c}{b}\right)\right)^{\big(-1-\frac 1 a\big)}
$$
for finding VaR and CVaR. $x$ is a vector of returns greater than or equal to $c$. The paper Parameter Estimation for 3-parameter gpd by the principle of maximum entropy by Singh and Guo (paper) provides MLE estimators in equations (45) and (46). Given a known value of $c$, how do I calculate $a$ and $b$ using the provided estimators?

Best Answer

Later edit: I give what seems to be a better solution here.

Note that the paper uses a different parameterization from the form given in the question. As Yves noted in comments, it uses $-a$ in place of your $a$ (both are common parameterizations; the only difficulty may be when it is unclear which parameterization is being used). If you convert answers back to your parameterization you'll have to make the corresponding change.

The paper says:

The MLE estimators can be expressed as:

$\sum_{i=1}^n \frac{(x_i-c)/b}{1-a(x_i-c)/b}=\frac{n}{1-a}\qquad\qquad\qquad$ (45)

$\sum_{i=1}^n \ln[1-a(x_i-c)/b]=-na\qquad\,$ (46)

[...] Clearly the likelihood function is maximum with respect to $c$ when $c=x_1$.

In fact there's some minor issues with the exposition; at the point where they set the derivative of the log-likelihood to zero, they're no longer dealing with $a,b$ and $c$ but with the estimators, $\hat{a},\hat{b}$ and $\hat{c}$. So they should really have:

i. $\hat{c} = x_{(1)}$ (at least it didn't appear that the sample was ordered until at that point they suddenly declare $x_1$ to be smallest; better to be explicit)

ii. Given the ML estimate of $c$, the parameters $a$ and $b$ are then estimated by simultaneously solving these two equations:

$\sum_{i=1}^n \frac{(x_i-\hat{c})/\hat{b}}{1-\hat{a}(x_i-\hat{c})/b}=\frac{n}{1-\hat{a}}\qquad\qquad\qquad$ (45)

$\sum_{i=1}^n \ln[1-\hat{a}(x_i-\hat{c})/\hat{b}]=-n\hat{a}\qquad$ (46)

for $\hat{a}$ and $\hat{b}$.

The idea is that given $\hat{c}$, you find $\hat{a}$ and $\hat{b}$ that make equations (45) and (46) true.

If you were to solve this pair of nonlinear equations simultaneously, generally you'd need some iterative scheme set up* that you can update the estimates numerically until (45) and (46) are very close to true.

*(starting with some reasonable guesses, such as method of moments or quantile-based estimates or by assuming $a=0$ and using the resulting ML estimate from an exponential for $\hat{b}$)

It's certainly possible to do so... however, most people would back up a step; rather than taking the derivative and setting it equal to zero and looking for an iterative scheme to solve the equations for $\hat{a}$ and $\hat{b}$, we can simply employ optimization methods to minimize the negative of the log-likelihood function and take as our parameter estimates the values of the parameters that give that minimum. That's what's usually done for the generalized Pareto.

This would again start with some reasonable guesses for $\hat{a}$ and $\hat{b}$ and iterate to reduce $-\ell=-\log\mathcal{L}$ until some minimum was effectively reached. One benefit of doing so is that once you've found your minimum, it's easier to get second derivative estimates out and so get asymptotic standard errors for the estimates.

[In practice with ML, since the likelihood function might not always be unimodal, it's often a good idea to evaluate it over a grid of plausible values to identify whether there are multiple local minima in $-\ell$ or other issues that might be relevant.]

Related Solutions

Solved – Conditional vs. Unconditional Maximum Likelihood

The main reason for using conditional maximum likelihood is the resulting distribution. For Y|X ~N(x'B,Var(eps)) holds because the variation of Y only depends on the (normal) variation of the eps. As you know the correct assumption of the underlying density is a crucial point in MML estimation and hence, with unconditional MML you would run into problems here. Think about how the unconditional distribution of Y would look like if X is for instance N(3, Var(eps)). Y would have a mixture distribution that is not even normal anymore. The density function that you write for unconditional likelihood (example 1) is therefore wrong and moreover, the variance of Y is definitely not σ2. If you don't condition on X the variation of X needs to be accounted for in the variation of Y. Again, the variation of X needs to be considered if you are using unconditional MML.

Solved – Fitting custom distributions by MLE

This answer assumes $\mu$ is known.

One very flexible way to get MLE's in R is to use STAN via rstan. STAN has a reputation for being an MCMC tool, but it also can estimate parameters by variational inference or MAP. And you're free to not specify the priors.

In this case, what you're doing is very similar to their hurdle-model example. Here is the STAN code for that example.

data {
  int<lower=0> N;
  int<lower=0> y[N];
}
parameters {
  real<lower=0, upper=1> theta;
  real<lower=0> lambda;
}
model {
  for (n in 1:N) {
    if (y[n] == 0)
      target += bernoulli_lpmf(1 | theta);
    else
      target += bernoulli_lpmf(0 | theta)
                  + poisson_lpmf(y[n] | lambda);
  }
}

To adapt this for your own use, you could:

Replace poisson_lpmf with the log-density for your $f_A$.
Add a third case to the if-else so that it checks for exceeding $\mu$, not just 0. As the meat of that third case, use the log pmf for your extreme value distribution of choice.
Replace bernoulli_lpmf with categorical_lpmf and make the mixture probability parameter into a vector.
To incorporate covariates, you can add regression parameters, and make all your other parameters functions of them. It may help to use categorical_logit_lpmf in place of categorical_lpmf.
Truncate one mixture component at $\mu$ from above and the other at $\mu$ from below, depending on your perspective on the dilemma raised by Jarle Tufto in the comments. It seems like you could get VERY different estimates depending on how exactly you decide to handle this. A nice sanity check: generate a fake dataset from the fitted parameters and make sure it has the right amount at 0, amount above $\mu$, etc.

Once you have a file with the right STAN code, you can use STAN with lots of different toolchains. To use it with R, check out these examples. I simplified one to get an MLE, using rstan::optimizing instead of sampling:

install.packages("rstan")
library("rstan")
model = stan_model("Example1.stan")
fit = optimizing(model)

There are also some tricks for faster/better optimization that could help in practice.

Best Answer

Related Solutions

Solved – Conditional vs. Unconditional Maximum Likelihood

Solved – Fitting custom distributions by MLE

Related Question