Solved – Usable estimators for parameters in Gumbel distribution

distributionsestimationgumbel distributioniidrandom variable

The Gumbel distribution has the general form:
$$F(y)=\exp\left({-\exp{\left(-\frac{y-\mu}{\sigma}\right)}}\right), \quad y\in \mathbb{R}$$
where $\mu \in \mathbb{R}$ and $\sigma >0$. Let $W_1,…,W_n$ be random variables with that distribution and iid. I want to find good estimators for $\sigma$ and $\mu$. What would be the best way to do that?

Best Answer

The Gumbel distribution

The Gumbel distribution is often used to model the distribution of extreme values. It is one of the three particular cases of the more generalized extreme value distribution (GEV), namely when the parameter of the GEV $\xi$ equals $0$ (that's why the Gumbel distribution is sometimes called "type I extreme value distribution"). The case of parameter fitting using maximum likelihood estimation (MLE) of a Gumbel distribution is discussed in Stuart Coles' book An Introduction to Statistical Modeling of Extreme Values (pages 55 ff.).

The Gumbel distribution's CDF is $$ F_{X}(x)=\exp\left({-\exp{\left(-\frac{x-\mu}{\sigma}\right)}}\right), \quad x\in \mathbb{R} $$ with $\mu \in \mathbb{R}$ and $\sigma >0$. The PDF is given by the expression

$$ f_{X}(x)=\frac{1}{\sigma}\exp{\left(-\frac{x-\mu}{\sigma} \right)}\cdot \exp{\left\{ -\exp{\left(-\frac{x-\mu}{\sigma}\right)}\right\}} $$

Finally, the quantile function (i.e. inverse CDF) for probability $p$ is given by $$ Q(p)=\mu-\sigma\cdot\log{\left(-\log{\left(p\right)}\right)}. $$

We will need those definitions later in R.


Parameter estimation

Given that $X_1,...,X_n$ are iid variables following a Gumbel distribution, the log-likelihood is $$ \ell(\mu,\sigma)=-n\log{(\sigma)}-\sum_{i=1}^{n}\left(\frac{x_{i}-\mu}{\sigma}\right)-\sum_{i=1}^{n}\exp{\left\{-\left(\frac{x_{i}-\mu}{\sigma}\right) \right\}}. $$ The log-likelihood can be maximized using standard numerical optimization algorithms.

In their book Statistical Distributions, Forbes et al. (2010) provide the MLE estimates for $\mu$ and $\sigma$. Namely, the estimators $\hat{\mu},\hat{\sigma}$ are the solutions of the simultaneous equations $$ \begin{align} \hat{\mu} &=-\hat{\sigma}\log{\left[\frac{1}{n}\sum_{i=1}^{n}\exp{\left(-\frac{x_{i}}{\hat{\sigma}}\right)} \right]}\\ \hat{\sigma} &= \bar{x}-\frac{\sum_{i=1}^{n} x_{i}\exp{\left(-\frac{x_{i}}{\hat{\sigma}}\right)}}{ \sum_{i=1}^{n} \exp{\left(-\frac{x_{i}}{\hat{\sigma}}\right)}}\\ \end{align} $$ where $\bar{x}$ denotes the sample mean.

The maximum likelihood estimation can be done in R (other statistical packages such as Stata and SAS provide similar capacities). Because the Gumbel distribution is not available by default in R, we have to define the CDF, PDF and quantile function, which is straightforward. Here is an example R-script that does the trick:

#=================================================================================
# Load package
#=================================================================================

library(fitdistrplus)

#=================================================================================
# Define the PDF, CDF and quantile function for the Gumbel distribution
#=================================================================================

dgumbel <- function(x,mu,s){ # PDF
  exp((mu - x)/s - exp((mu - x)/s))/s
}

pgumbel <- function(q,mu,s){ # CDF
  exp(-exp(-((q - mu)/s)))
}

qgumbel <- function(p, mu, s){ # quantile function
  mu-s*log(-log(p))
}

#=================================================================================
# Some data (annual maximum mean daily flows ("annual floods"))
#=================================================================================

flood.data <- c(312,590,248,670,365,770,465,545,315,115,232,260,655,675,
              455,1020,700,570,853,395,926,99,680,121,976,916,921,191,
              187,377,128,582,744,710,520,672,645,655,918,512,255,1126,
              1386,1394,600,950,731,700,1407,1284,165,1496,809)

#=================================================================================
# Fit the Gumbel distribution using maximum likelihood estimation (MLE)
# Make some diagnostic plots
#=================================================================================

gumbel.fit <- fitdist(flood.data, "gumbel", start=list(mu=5, s=5), method="mle")

summary(gumbel.fit)

Fitting of the distribution ' gumbel ' by maximum likelihood 
Parameters : 
   estimate Std. Error
mu 471.6864   43.33664
s  298.8155   32.11813
Loglikelihood:  -385.1877   AIC:  774.3754   BIC:  778.316 
Correlation matrix:
          mu         s
mu 1.0000000 0.3208292
s  0.3208292 1.0000000

gofstat(gumbel.fit, discrete=FALSE) # goodness-of-fit statistics


Goodness-of-fit statistics
                             1-mle-gumbel
Kolmogorov-Smirnov statistic   0.09956968
Cramer-von Mises statistic     0.08826106
Anderson-Darling statistic     0.53360850

Goodness-of-fit criteria
                               1-mle-gumbel
Aikake's Information Criterion     774.3754
Bayesian Information Criterion     778.3160

# Plot the fit

par(cex=1.2, bg="white")
plot(gumbel.fit, lwd=2, col="steelblue")

Gumbel MLE fit

The maximum likelihood estimates are $\hat{\mu} = 471.69, \hat{\sigma}=298.82$ with respective standard errors $\widehat{\mathrm{SE}}_{\hat{\mu}}=43.34, \widehat{\mathrm{SE}}_{\hat{\sigma}}=32.12$. The fit looks reasonable (there are some hints of systematic deviations) and the package fitdistrplus provides the estimated standard errors of the parameters and goodness-of-fit statistics.