Solved – How to implement MLE of Gumbel Distribution

distributionsgumbel distributionmaximum likelihoodr

I'm trying implement the Maximum Likelihood Estimation in R to Gumbel distribution, but the algorithm doesn't converge.

I'm using this parametrization to gumbel:

$${\frac {1}{\sigma}{{\rm e}^{{\frac {-(x-\mu)}{\sigma}}-{{\rm e}^{{\frac
{-(x-\mu)}{\sigma}}}}}}}$$

Then the log-likelihood is:

$$-n\ln \left( \sigma \right) +\sum _{i=1}^{n}\ln \left( {{\rm e}^{Z_{
{i}}-{{\rm e}^{Z_{{i}}}}}} \right)$$
where $Z_i = \dfrac{-(x_i + \mu)}{\sigma}$

The Newton-Raphson technique to find the $\hat{\mu}$ and $\hat{\sigma}$ is:

$$\hat{\theta} = \theta_0 – H^{-1}(\theta_0)\,U(\theta_0)$$

where $\theta = (\mu, \sigma)^{T}$, $H$ is the hessian matrix and $U$ is the score vector.

For Gumbel distribution we have to:

\begin{equation}
U(\mu,\sigma\mid x) = \begin{cases}
{\frac {n}{\sigma}}+\sum _{i=1}^{n}-{\frac {{{\rm e}^{Z_{{i}}}}}{
\sigma}}\\
{\frac {\sum _{i=1}^{n} \left( -x_{{i}}+\mu \right) {{\rm e}^{Z_{{i}}}
}+x_{{i}}+ \left( -\mu-\sigma \right) n}{{\sigma}^{2}}}
\end{cases}
\end{equation}

and
$$ H(\mu, \sigma \mid x) = \left[ \begin {array}{cc} -{\frac {\sum _{i=1}^{n}{{\rm e}^{Z_{{i}}}}
}{{\sigma}^{2}}}&{\frac {-n\sigma+\sum _{i=1}^{n}{{\rm e}^{Z_{{i}}}}
\left( \sigma-x_{{i}}+\mu \right) }{{\sigma}^{3}}} \\
{\it\textrm{Sym}}&{\frac {2\,n\mu\,\sigma+n{\sigma}^{2}-
\sum _{i=1}^{n} \left( -x_{{i}}+\mu \right) \left( \mu+2\,\sigma-x_{{i}} \right) {{\rm e}^{Z_{{i}}}}+2\,x_{{i}}\sigma}{{\sigma}^{4}}}\end {array} \right]$$

Simulated random variate from Gumbel I use the Inverse-Transform Method, that is:

$$\mu -\sigma\ln \left( -\ln \left( U(0,1) \right) \right)$$

My R code is:

log-Likelihood

logLH <- function(theta){
  mu <- theta[1]
  sigma <- theta[2]
  Z <- -( (x - mu)/sigma)
  t1 <- log(sigma)
  t3 <- sum(log(exp(Z - exp(Z))))
  l <- -t1 * n + t3
  return(l)
}

Score Vector

U <- function(n, x, theta){
  mu <- theta[1]
  sigma <- theta[2]
  Z <- -((x - mu)/sigma)
  Es <- matrix(nrow=2)  
  Es[1] <- n / sigma - sum(1 / sigma * exp(Z))
  Es[2] <- (sum((mu-x) * exp(Z) + x) - (mu + sigma) * n) / sigma ^ 2
  return(Es)
}

Hessian Matrix

H <- function(n, x, theta){
    mu <- theta[1]
    sigma <- theta[2]
    Z <- -((x - mu)/sigma)
    Hs <- matrix(ncol=2,nrow=2)
    Hs[1,1] <- -1 / sigma ^ 2 * sum(exp(Z))  
    Hs[1,2] <- (-n * sigma + sum(exp(Z) * (sigma - x + mu))) / sigma ^ 3
    Hs[2,1] <- Hs[1,2]
    Hs[2,2] <- (2 * n * mu * sigma + n * sigma ^ 2 - sum((-x + mu) * 
              (mu + 2 * sigma - x) * exp(Z) + 2 * x * sigma)) / sigma^4
    return(Hs)
}

Random Variable Generation

n <- 100
mu <- .5
sigma <- 2
x <- mu - sigma*log(-log(runif(n)))

Newton-Raphson

tol <- 1e-6
iter <- 1
t.hat <- matrix(c(.3, 4))

while(max(abs(U(n,x,t.hat))) > tol & iter <= 20){
  t.hat <- t.hat - solve(H(n,x,t.hat))%*%U(n,x,t.hat)
  cat(iter, t.hat, "\n")
  iter <- iter + 1
}

Using the library maxLik the method converge

library(maxLik)
maxLik(logLik = logLH, start = c(.3,4))

Best Answer

The MLE of Gumbel or other Extreme Value distributions is available in several R packages on the CRAN such as evd. The moments estimates of $\mu$ and $\sigma$ are often used as initial values. An useful tip is to scale the data e.g. by dividing them by the mean of the absolute values.

Another possibility for the MLE of Gumbel parameters is to use the relation with the Peak Over Threshold (POT) model. Assume that events or arrivals $T_k$ come according to an Homogeneous Poisson Process with rate $\lambda$, and that for an event $k$ we can observe a mark r.v; $Y_k$ with density $f_Y(y)$, see figure. We further assume that the marks $Y_k$ are independent and are independent of the Poisson arrivals $T_k$.

Now assume that we have $n$ non-overlapping blocks of time with the same duration $w$ and consider the Block Maxima (BM) $$ M_i := \max_{T_k \in \text{block } i} Y_k $$ so each $M_i$ is the maximum of a random number $N_i$ of marks. Then the $M_i$ are independent and have a density $f_M(y)$ that can be written as an expression involving $f_Y(y)$. With some algebra, the unknown $N_i$ can indeed be ``marginalised out'' and up to an unimportant constant the log-likelihood is $$ \ell = \sum_{i=1}^n \left\{ \log(\lambda w) - \lambda w S_Y(M_i) + \log f_Y(M_i) \right \} $$ where $S_Y(y)$ is the survival $S_Y(y):= \text{Pr}\{Y > y\}$.

A special case is when the marks $Y_k$ come from the exponential distribution $$ f_Y(y) = \frac{1}{\sigma_Y} \exp\left\{ -\frac{y - \mu_Y}{\sigma_Y} \right\} 1_{\{y \geq \mu_Y\}}, \qquad S_Y(y) = \exp\left\{ -\left[\frac{y - \mu_Y}{\sigma_Y}\right]_+ \right\}. $$ It can be shown that the $M_i$ are independent and $M_i \sim \text{Gumbel}(\mu_M,\,\sigma_M)$ with $$ \mu_M = \mu_Y + \log(\lambda w) \sigma_Y, \quad \sigma_M = \sigma_Y. $$ Observe that for given $\mu_M$ and $\sigma_M$ there exists an infinity of triples $[\lambda,\,\mu_Y,\,\sigma_Y]$ giving the same BM distribution. The tip is to choose an arbitrary value for $\mu_Y$ compatible with the observations $M_i$ - which imply $\mu_Y < \min_i M_i$ - and fit the POT model by MLE using the available observations, namely the $n$ BM $M_i$. We now have to estimate the two unknown POT parameters $\lambda$ and $\sigma_Y$ rather than the two Gumbel parameters. So what the point? The interesting feature is that $\lambda$ can be concentrated out of the log-likelihood $$ \widehat{\lambda}(\sigma_Y) = \frac{n}{\sum_i w S_Y(M_i;\,\mu_Y,\,\sigma_Y )}. $$ So rather than maximising $\ell(\lambda,\,\sigma_Y)$ we can maximise the profiled or concentrated log-likelihood $$ \ell_{\text{prof}}(\sigma_Y) := \ell(\widehat{\lambda}(\sigma_Y),\,\sigma_Y), $$ which depends of only one parameter $\sigma_Y$, making the maximisation job much easier. Then we can find the Gumbel parameters from the POT parameters by using the relation above.

This strategy extends as well to the GEV distribution depending on three parameters $\mu_M$, $\sigma_M$ and $\xi_M$. A Generalised Pareto (GP) with three parameters $\mu_Y$, $\sigma_Y$ and $\xi_Y$ used in POT. The GP location $\mu_Y$ is fixed, and the rate $\lambda$ can be concentrated out of the likelihood, leading to a two-parameter optimisation. Moreover it can be used in the so-called $r$ largest context, where the $r_k \geqslant 1$ largest observations are available in block $k$. enter image description here

fGumbel <- function(x) {

    n <- length(x)
    ## choose the location of the exponential distribution of the
    ## marks 'Y', i.e. the threshold for the POT model. The block
    ## duration 'w' is implicitly set to 1.
    muY <- min(x) 

    negLogLikc <- function(sigmaY) {
        lambdaHat <-  n / sum(exp(-(x - muY) / sigmaY))
        nll <- -n * log(lambdaHat) + n * log(sigmaY) + sum(x - muY) / sigmaY
        attr(nll, "lambda") <- lambdaHat
        nll
    }

    ## the search interval for the minimum could be improved...
    opt <- optimize(f = negLogLikc, interval = c(1e-6, 1e3))

    lambdaHat <- attr(opt$objective, "lambda")
    sigmaMHat <- sigmaYHat <- opt$minimum
    muMHat <- muY+ log(lambdaHat) * sigmaYHat
    deviance.check <- -2 * sum(dgumbel(y, loc = muMHat, scale = sigmaMHat,
                              log = TRUE))

    list(estimate = c("loc" = muMHat, "scale" = sigmaMHat),
         lambda = lambdaHat,
         deviance = deviance.check)
}

require(evd)
set.seed(123)
n <- 30
y <- rgumbel(n)

fit.evd <- evd::fgev(y, shape = 0.0)
fit.new <- fGumbel(y)

cbind("evd::fgev" = c(fit.evd$estimate, "deviance" = fit.evd$deviance),
      "new" = c(fit.new$estimate, fit.new$deviance))