The Gumbel distribution
The Gumbel distribution is often used to model the distribution of extreme values. It is one of the three particular cases of the more generalized extreme value distribution (GEV), namely when the parameter of the GEV $\xi$ equals $0$ (that's why the Gumbel distribution is sometimes called "type I extreme value distribution"). The case of parameter fitting using maximum likelihood estimation (MLE) of a Gumbel distribution is discussed in Stuart Coles' book An Introduction to Statistical Modeling of Extreme Values (pages 55 ff.).
The Gumbel distribution's CDF is
$$
F_{X}(x)=\exp\left({-\exp{\left(-\frac{x-\mu}{\sigma}\right)}}\right), \quad x\in \mathbb{R}
$$
with $\mu \in \mathbb{R}$ and $\sigma >0$. The PDF is given by the expression
$$
f_{X}(x)=\frac{1}{\sigma}\exp{\left(-\frac{x-\mu}{\sigma} \right)}\cdot \exp{\left\{ -\exp{\left(-\frac{x-\mu}{\sigma}\right)}\right\}}
$$
Finally, the quantile function (i.e. inverse CDF) for probability $p$ is given by
$$
Q(p)=\mu-\sigma\cdot\log{\left(-\log{\left(p\right)}\right)}.
$$
We will need those definitions later in R
.
Parameter estimation
Given that $X_1,...,X_n$ are iid variables following a Gumbel distribution, the log-likelihood is
$$
\ell(\mu,\sigma)=-n\log{(\sigma)}-\sum_{i=1}^{n}\left(\frac{x_{i}-\mu}{\sigma}\right)-\sum_{i=1}^{n}\exp{\left\{-\left(\frac{x_{i}-\mu}{\sigma}\right) \right\}}.
$$
The log-likelihood can be maximized using standard numerical optimization algorithms.
In their book Statistical Distributions, Forbes et al. (2010) provide the MLE estimates for $\mu$ and $\sigma$. Namely, the estimators $\hat{\mu},\hat{\sigma}$ are the solutions of the simultaneous equations
$$
\begin{align}
\hat{\mu} &=-\hat{\sigma}\log{\left[\frac{1}{n}\sum_{i=1}^{n}\exp{\left(-\frac{x_{i}}{\hat{\sigma}}\right)} \right]}\\
\hat{\sigma} &= \bar{x}-\frac{\sum_{i=1}^{n} x_{i}\exp{\left(-\frac{x_{i}}{\hat{\sigma}}\right)}}{ \sum_{i=1}^{n} \exp{\left(-\frac{x_{i}}{\hat{\sigma}}\right)}}\\
\end{align}
$$
where $\bar{x}$ denotes the sample mean.
The maximum likelihood estimation can be done in R
(other statistical packages such as Stata and SAS provide similar capacities). Because the Gumbel distribution is not available by default in R
, we have to define the CDF, PDF and quantile function, which is straightforward. Here is an example R
-script that does the trick:
#=================================================================================
# Load package
#=================================================================================
library(fitdistrplus)
#=================================================================================
# Define the PDF, CDF and quantile function for the Gumbel distribution
#=================================================================================
dgumbel <- function(x,mu,s){ # PDF
exp((mu - x)/s - exp((mu - x)/s))/s
}
pgumbel <- function(q,mu,s){ # CDF
exp(-exp(-((q - mu)/s)))
}
qgumbel <- function(p, mu, s){ # quantile function
mu-s*log(-log(p))
}
#=================================================================================
# Some data (annual maximum mean daily flows ("annual floods"))
#=================================================================================
flood.data <- c(312,590,248,670,365,770,465,545,315,115,232,260,655,675,
455,1020,700,570,853,395,926,99,680,121,976,916,921,191,
187,377,128,582,744,710,520,672,645,655,918,512,255,1126,
1386,1394,600,950,731,700,1407,1284,165,1496,809)
#=================================================================================
# Fit the Gumbel distribution using maximum likelihood estimation (MLE)
# Make some diagnostic plots
#=================================================================================
gumbel.fit <- fitdist(flood.data, "gumbel", start=list(mu=5, s=5), method="mle")
summary(gumbel.fit)
Fitting of the distribution ' gumbel ' by maximum likelihood
Parameters :
estimate Std. Error
mu 471.6864 43.33664
s 298.8155 32.11813
Loglikelihood: -385.1877 AIC: 774.3754 BIC: 778.316
Correlation matrix:
mu s
mu 1.0000000 0.3208292
s 0.3208292 1.0000000
gofstat(gumbel.fit, discrete=FALSE) # goodness-of-fit statistics
Goodness-of-fit statistics
1-mle-gumbel
Kolmogorov-Smirnov statistic 0.09956968
Cramer-von Mises statistic 0.08826106
Anderson-Darling statistic 0.53360850
Goodness-of-fit criteria
1-mle-gumbel
Aikake's Information Criterion 774.3754
Bayesian Information Criterion 778.3160
# Plot the fit
par(cex=1.2, bg="white")
plot(gumbel.fit, lwd=2, col="steelblue")
The maximum likelihood estimates are $\hat{\mu} = 471.69, \hat{\sigma}=298.82$ with respective standard errors $\widehat{\mathrm{SE}}_{\hat{\mu}}=43.34, \widehat{\mathrm{SE}}_{\hat{\sigma}}=32.12$. The fit looks reasonable (there are some hints of systematic deviations) and the package fitdistrplus
provides the estimated standard errors of the parameters and goodness-of-fit statistics.
Best Answer
You need to change the initial value of sigma. Default is zero, so you're dividing by zero and get missings at your initial values, as your error message indicates.