Solved – Usable estimators for parameters in Gumbel distribution

distributionsestimationgumbel distributioniidrandom variable

The Gumbel distribution has the general form:
$$F(y)=\exp\left({-\exp{\left(-\frac{y-\mu}{\sigma}\right)}}\right), \quad y\in \mathbb{R}$$
where $\mu \in \mathbb{R}$ and $\sigma >0$. Let $W_1,…,W_n$ be random variables with that distribution and iid. I want to find good estimators for $\sigma$ and $\mu$. What would be the best way to do that?

Best Answer

The Gumbel distribution

The Gumbel distribution is often used to model the distribution of extreme values. It is one of the three particular cases of the more generalized extreme value distribution (GEV), namely when the parameter of the GEV $\xi$ equals $0$ (that's why the Gumbel distribution is sometimes called "type I extreme value distribution"). The case of parameter fitting using maximum likelihood estimation (MLE) of a Gumbel distribution is discussed in Stuart Coles' book An Introduction to Statistical Modeling of Extreme Values (pages 55 ff.).

The Gumbel distribution's CDF is $$ F_{X}(x)=\exp\left({-\exp{\left(-\frac{x-\mu}{\sigma}\right)}}\right), \quad x\in \mathbb{R} $$ with $\mu \in \mathbb{R}$ and $\sigma >0$. The PDF is given by the expression

$$ f_{X}(x)=\frac{1}{\sigma}\exp{\left(-\frac{x-\mu}{\sigma} \right)}\cdot \exp{\left\{ -\exp{\left(-\frac{x-\mu}{\sigma}\right)}\right\}} $$

Finally, the quantile function (i.e. inverse CDF) for probability $p$ is given by $$ Q(p)=\mu-\sigma\cdot\log{\left(-\log{\left(p\right)}\right)}. $$

We will need those definitions later in R.

Parameter estimation

Given that $X_1,...,X_n$ are iid variables following a Gumbel distribution, the log-likelihood is $$ \ell(\mu,\sigma)=-n\log{(\sigma)}-\sum_{i=1}^{n}\left(\frac{x_{i}-\mu}{\sigma}\right)-\sum_{i=1}^{n}\exp{\left\{-\left(\frac{x_{i}-\mu}{\sigma}\right) \right\}}. $$ The log-likelihood can be maximized using standard numerical optimization algorithms.

In their book Statistical Distributions, Forbes et al. (2010) provide the MLE estimates for $\mu$ and $\sigma$. Namely, the estimators $\hat{\mu},\hat{\sigma}$ are the solutions of the simultaneous equations $$ \begin{align} \hat{\mu} &=-\hat{\sigma}\log{\left[\frac{1}{n}\sum_{i=1}^{n}\exp{\left(-\frac{x_{i}}{\hat{\sigma}}\right)} \right]}\\ \hat{\sigma} &= \bar{x}-\frac{\sum_{i=1}^{n} x_{i}\exp{\left(-\frac{x_{i}}{\hat{\sigma}}\right)}}{ \sum_{i=1}^{n} \exp{\left(-\frac{x_{i}}{\hat{\sigma}}\right)}}\\ \end{align} $$ where $\bar{x}$ denotes the sample mean.

The maximum likelihood estimation can be done in R (other statistical packages such as Stata and SAS provide similar capacities). Because the Gumbel distribution is not available by default in R, we have to define the CDF, PDF and quantile function, which is straightforward. Here is an example R-script that does the trick:

#=================================================================================
# Load package
#=================================================================================

library(fitdistrplus)

#=================================================================================
# Define the PDF, CDF and quantile function for the Gumbel distribution
#=================================================================================

dgumbel <- function(x,mu,s){ # PDF
  exp((mu - x)/s - exp((mu - x)/s))/s
}

pgumbel <- function(q,mu,s){ # CDF
  exp(-exp(-((q - mu)/s)))
}

qgumbel <- function(p, mu, s){ # quantile function
  mu-s*log(-log(p))
}

#=================================================================================
# Some data (annual maximum mean daily flows ("annual floods"))
#=================================================================================

flood.data <- c(312,590,248,670,365,770,465,545,315,115,232,260,655,675,
              455,1020,700,570,853,395,926,99,680,121,976,916,921,191,
              187,377,128,582,744,710,520,672,645,655,918,512,255,1126,
              1386,1394,600,950,731,700,1407,1284,165,1496,809)

#=================================================================================
# Fit the Gumbel distribution using maximum likelihood estimation (MLE)
# Make some diagnostic plots
#=================================================================================

gumbel.fit <- fitdist(flood.data, "gumbel", start=list(mu=5, s=5), method="mle")

summary(gumbel.fit)

Fitting of the distribution ' gumbel ' by maximum likelihood 
Parameters : 
   estimate Std. Error
mu 471.6864   43.33664
s  298.8155   32.11813
Loglikelihood:  -385.1877   AIC:  774.3754   BIC:  778.316 
Correlation matrix:
          mu         s
mu 1.0000000 0.3208292
s  0.3208292 1.0000000

gofstat(gumbel.fit, discrete=FALSE) # goodness-of-fit statistics


Goodness-of-fit statistics
                             1-mle-gumbel
Kolmogorov-Smirnov statistic   0.09956968
Cramer-von Mises statistic     0.08826106
Anderson-Darling statistic     0.53360850

Goodness-of-fit criteria
                               1-mle-gumbel
Aikake's Information Criterion     774.3754
Bayesian Information Criterion     778.3160

# Plot the fit

par(cex=1.2, bg="white")
plot(gumbel.fit, lwd=2, col="steelblue")

Gumbel MLE fit

The maximum likelihood estimates are $\hat{\mu} = 471.69, \hat{\sigma}=298.82$ with respective standard errors $\widehat{\mathrm{SE}}_{\hat{\mu}}=43.34, \widehat{\mathrm{SE}}_{\hat{\sigma}}=32.12$. The fit looks reasonable (there are some hints of systematic deviations) and the package fitdistrplus provides the estimated standard errors of the parameters and goodness-of-fit statistics.

Related Solutions

Expected Value – Expectation of the Maximum of iid Gumbel Variables

I appreciate the work exhibited in your answer: thank you for that contribution. The purpose of this post is to provide a simpler demonstration. The value of simplicity is revelation: we can easily obtain the entire distribution of the maximum, not just its expectation.

Ignore $\mu$ by absorbing it into the $\delta_i$ and assuming the $\epsilon_i$ all have a Gumbel$(0,1)$ distribution. (That is, replace each $\epsilon_i$ by $\epsilon_i-\mu$ and change $\delta_i$ to $\delta_i+\mu$.) This does not change the random variable

$$X = \max_{i}(\delta_i + \epsilon_i) = \max_i((\delta_i+\mu) + (\epsilon_i-\mu)).$$

The independence of the $\epsilon_i$ implies for all real $x$ that $\Pr(X\le x)$ is the product of the individual chances $\Pr(\delta_i+\epsilon_i\le x)$. Taking logs and applying basic properties of exponentials yields

$$\eqalign{ \log \Pr(X\le x) &= \log\prod_{i}\Pr(\delta_i + \epsilon_i \le x) = \sum_i \log\Pr(\epsilon_i \le x - \delta_i)\\ &= -\sum_ie^{\delta_i}\, e^{-x} = -\exp\left(-x + \log\sum_i e^{\delta_i}\right). }$$

This is the logarithm of the CDF of a Gumbel distribution with location parameter $\lambda=\log\sum_i e^{\delta_i}.$ That is,

$X$ has a Gumbel$\left(\log\sum_i e^{\delta_i}, 1\right)$ distribution.

This is much more information than requested. The mean of such a distribution is $\gamma+\lambda,$ entailing

$$\mathbb{E}[X] = \gamma + \log\sum_i e^{\delta_i},$$

QED.

Solved – How to implement MLE of Gumbel Distribution

The MLE of Gumbel or other Extreme Value distributions is available in several R packages on the CRAN such as evd. The moments estimates of $\mu$ and $\sigma$ are often used as initial values. An useful tip is to scale the data e.g. by dividing them by the mean of the absolute values.

Another possibility for the MLE of Gumbel parameters is to use the relation with the Peak Over Threshold (POT) model. Assume that events or arrivals $T_k$ come according to an Homogeneous Poisson Process with rate $\lambda$, and that for an event $k$ we can observe a mark r.v; $Y_k$ with density $f_Y(y)$, see figure. We further assume that the marks $Y_k$ are independent and are independent of the Poisson arrivals $T_k$.

Now assume that we have $n$ non-overlapping blocks of time with the same duration $w$ and consider the Block Maxima (BM) $$ M_i := \max_{T_k \in \text{block } i} Y_k $$ so each $M_i$ is the maximum of a random number $N_i$ of marks. Then the $M_i$ are independent and have a density $f_M(y)$ that can be written as an expression involving $f_Y(y)$. With some algebra, the unknown $N_i$ can indeed be ``marginalised out'' and up to an unimportant constant the log-likelihood is $$ \ell = \sum_{i=1}^n \left\{ \log(\lambda w) - \lambda w S_Y(M_i) + \log f_Y(M_i) \right \} $$ where $S_Y(y)$ is the survival $S_Y(y):= \text{Pr}\{Y > y\}$.

A special case is when the marks $Y_k$ come from the exponential distribution $$ f_Y(y) = \frac{1}{\sigma_Y} \exp\left\{ -\frac{y - \mu_Y}{\sigma_Y} \right\} 1_{\{y \geq \mu_Y\}}, \qquad S_Y(y) = \exp\left\{ -\left[\frac{y - \mu_Y}{\sigma_Y}\right]_+ \right\}. $$ It can be shown that the $M_i$ are independent and $M_i \sim \text{Gumbel}(\mu_M,\,\sigma_M)$ with $$ \mu_M = \mu_Y + \log(\lambda w) \sigma_Y, \quad \sigma_M = \sigma_Y. $$ Observe that for given $\mu_M$ and $\sigma_M$ there exists an infinity of triples $[\lambda,\,\mu_Y,\,\sigma_Y]$ giving the same BM distribution. The tip is to choose an arbitrary value for $\mu_Y$ compatible with the observations $M_i$ - which imply $\mu_Y < \min_i M_i$ - and fit the POT model by MLE using the available observations, namely the $n$ BM $M_i$. We now have to estimate the two unknown POT parameters $\lambda$ and $\sigma_Y$ rather than the two Gumbel parameters. So what the point? The interesting feature is that $\lambda$ can be concentrated out of the log-likelihood $$ \widehat{\lambda}(\sigma_Y) = \frac{n}{\sum_i w S_Y(M_i;\,\mu_Y,\,\sigma_Y )}. $$ So rather than maximising $\ell(\lambda,\,\sigma_Y)$ we can maximise the profiled or concentrated log-likelihood $$ \ell_{\text{prof}}(\sigma_Y) := \ell(\widehat{\lambda}(\sigma_Y),\,\sigma_Y), $$ which depends of only one parameter $\sigma_Y$, making the maximisation job much easier. Then we can find the Gumbel parameters from the POT parameters by using the relation above.

This strategy extends as well to the GEV distribution depending on three parameters $\mu_M$, $\sigma_M$ and $\xi_M$. A Generalised Pareto (GP) with three parameters $\mu_Y$, $\sigma_Y$ and $\xi_Y$ used in POT. The GP location $\mu_Y$ is fixed, and the rate $\lambda$ can be concentrated out of the likelihood, leading to a two-parameter optimisation. Moreover it can be used in the so-called $r$ largest context, where the $r_k \geqslant 1$ largest observations are available in block $k$.

fGumbel <- function(x) {

    n <- length(x)
    ## choose the location of the exponential distribution of the
    ## marks 'Y', i.e. the threshold for the POT model. The block
    ## duration 'w' is implicitly set to 1.
    muY <- min(x) 

    negLogLikc <- function(sigmaY) {
        lambdaHat <-  n / sum(exp(-(x - muY) / sigmaY))
        nll <- -n * log(lambdaHat) + n * log(sigmaY) + sum(x - muY) / sigmaY
        attr(nll, "lambda") <- lambdaHat
        nll
    }

    ## the search interval for the minimum could be improved...
    opt <- optimize(f = negLogLikc, interval = c(1e-6, 1e3))

    lambdaHat <- attr(opt$objective, "lambda")
    sigmaMHat <- sigmaYHat <- opt$minimum
    muMHat <- muY+ log(lambdaHat) * sigmaYHat
    deviance.check <- -2 * sum(dgumbel(y, loc = muMHat, scale = sigmaMHat,
                              log = TRUE))

    list(estimate = c("loc" = muMHat, "scale" = sigmaMHat),
         lambda = lambdaHat,
         deviance = deviance.check)
}

require(evd)
set.seed(123)
n <- 30
y <- rgumbel(n)

fit.evd <- evd::fgev(y, shape = 0.0)
fit.new <- fGumbel(y)

cbind("evd::fgev" = c(fit.evd$estimate, "deviance" = fit.evd$deviance),
      "new" = c(fit.new$estimate, fit.new$deviance))