Solved – Different quantiles of a fitted GPD in different R packages

extreme valuemaximum likelihoodquantilesr

I am performing an extreme value analysis for meteorological data, to be precise for precipitation data available in mm/d. I am using a threshold excess approach for estimating the parameters of a generalized Pareto distribution with a maximum likelihood method.

The aim is to calculate several return levels (i.e. the 2, 5, 10, 20, 50, 100 year event) for daily precipitation.

While the R code works fine, I am wondering why I get clearly different results when calculating the quantiles (and thus the return levels) of the fitted GPD with different packages. Even though the estimated parameters of the GPD are almost identical in each package, the quantiles differ a lot.

The packages I used are:
ismev, extRemes, evir and POT.

I guess that the different estimates for the parameters of the GPD are due to different calculation routines, but I do not understand why the calculation of the quantiles differs that much depending on the different packages.

# packages
library(ismev)
library(extRemes)
library(evir)
library(POT)
library(lmom)

th <- 50

# sample data:
potvalues <- c(
  58.5,44.2,49.6,59.3,48.3,60.9,94.5,47.1,45.3,57.6,48.2,46.2,44.2,50.6,42.1,52.7,80.9,
  58.5,51.3,48.4,51.7,71.9,60.1,64.4,43.5,55.5,49.3,58.2,47.5,43.7,45.2,52.8,42.2,46.4,
  96.1,47.5,50.1,42.4,60.9,72.6,51.6,59.4,80.5,63.7,59.9,45.0,66.7,47.6,53.3,43.1,51.0,
  46.2,53.6,59.8,51.7,46.7,42.6,44.5,45.0,50.0,44.0,89.9,44.2,47.8,53.3,43.0,55.7,44.6,
  44.6,54.9,45.1,43.9,78.7,45.5,64.0,42.7,47.4,57.0,105.4,64.3,43.2,50.4,80.2,49.9,71.6,
  47.4,44.1,47.6,55.2,44.4,78.6,50.8,42.4,47.1,43.5,51.4)

#------------------------------------------------------------------------------------------#

# MLE Fitting of GPD - package extRemes

# fit gpd
pot.ext <- fevd(potvalues, method = "MLE", type="GP", threshold=th)

# return levels:
rl.extremes <-  return.level(pot.ext, conf = 0.05,
                             return.period= c(2,5,10,20,50,100))
rl.extremes <- as.numeric(rl.extremes)

#------------------------------------------------------------------------------------------#

# MLE Fitting of GPD - package ismev

pot.gpd <- gpd.fit(potvalues, threshold=th)

s1 <- quagpa(f=.99, para=c(pot.gpd$threshold, pot.gpd$mle)) # 100
s2 <- quagpa(f=.98, para=c(pot.gpd$threshold, pot.gpd$mle)) #  50
s3 <- quagpa(f=.95, para=c(pot.gpd$threshold, pot.gpd$mle)) #  20
s4 <- quagpa(f=.90, para=c(pot.gpd$threshold, pot.gpd$mle)) #  10
s5 <- quagpa(f=.80, para=c(pot.gpd$threshold, pot.gpd$mle)) #   5
s6 <- quagpa(f=.50, para=c(pot.gpd$threshold, pot.gpd$mle)) #   2

rl.ismev <- c(s6, s5, s4, s3, s2, s1)

#------------------------------------------------------------------------------------------#

# MLE Fitting of GPD - package evir

# fit gpd
gpd.evir <- gpd(potvalues, threshold=th)

# plot
evirplot <- plot(gpd.evir)
1 # Excess Distribution
0 # exit

x100 <- gpd.q(pp=.99, x=evirplot) # 100
x050 <- gpd.q(pp=.98, x=evirplot) #  50
x020 <- gpd.q(pp=.95, x=evirplot) #  20
x010 <- gpd.q(pp=.90, x=evirplot) #  10
x005 <- gpd.q(pp=.80, x=evirplot) #   5
x002 <- gpd.q(pp=.50, x=evirplot) #   2

rl.evir <- t(round(rbind(x002,x005,x010,x020,x050,x100),1))
rl.evir <- as.numeric(rl.evir[2,])

#------------------------------------------------------------------------------------------#

# MLE Fitting of GPD - package POT

gpd.pot <- fitgpd(potvalues, threshold=th)
quant = c(0.50, 0.80, 0.90, 0.95, 0.98, 0.99)
rtp <- c(2,5,10,20,50,100)

retvec <- vector()
for (i in quant){
  x <- POT::qgpd(i, loc = th, scale = as.numeric(gpd.pot$param[1]),
                shape = as.numeric(gpd.pot$param[2]))
  retvec <- c(retvec,x)
}

rl.pot <- retvec

#------------------------------------------------------------------------------------------#
# comparison of results - return periods
(result <- cbind(rl.extremes,rl.ismev, rl.evir, rl.pot))

#------------------------------------------------------------------------------------------#
# comparison of estimated parameters
param.extremes <- pot.ext$results$par # extremes
param.ismev <- pot.gpd$mle # ismev
    param.evir <- c(gpd.evir$par.ests[2],gpd.evir$par.ests[1])  # evir
    param.pot <- gpd.pot$param # POT

parameters <- cbind(param.extremes, param.ismev , param.evir, param.pot)
round(parameters, 4)

#------------------------------------------------------------------------------------------#

Best Answer

You could try also evd or Renext packages.

As you noted, all packages functions returned quite similar parameters, so all fitted models embed the same quantile function and the problems come from differing time scales.

For POT models, I believe useful to think of an event rate, say $\lambda$, which is estimated as well as the model for the levels $X_i$ or for excesses $Y_i = X_i-u$, where $u$ is the threshold. The estimated rate $\lambda$ is the number of levels $X_i$ with $X_i>u$ per unit of duration. If you want to use years to compute return levels (RL), you should consider that $\lambda$ expresses in event by year (or inverse year). The return level for period $T$ is then given by $$ x(T) = q_X(1 - 1/\lambda/T) = u + q_Y(1 - 1/\lambda/T) $$ where $q_X$, $q_Y$ are the quantile functions. This formula can be used for $T > 1/\lambda$.

The rate $\lambda$ can be estimated by $\widehat{\lambda} = n_Y /w$ where $w$ is the duration of the fit period and $n_Y$ is the number of exceedances, here $48$. Curiously enough, most R packages do not require $w$ or even allow the user to give it. They rather use a convention which must be well understood by carefully reading the documentation and the examples. For extremes::fevd, the default values of the formals will correspond to $w = 96/365$ years, so $\lambda =182.5$ evt/year in your example. With ismev::gpd.fit you can specify $w$ using a value $n_X / w$ for the npy formal. Similarly evd::fpot has a npp argument for that.

Note that the parameters for the GP excesses do not depend on (``are orthogonal to'') $\lambda$, and their estimation is not impacted by the assumption on $\lambda$ which will however be crucial for RLs. Yet fitted model have an estimation of $\lambda$ used to compute the RLs.

Except for extremes, you computed the return levels yourself, not relying on a prediction function and assuming that $\lambda =1$. This may not be appropriate, because it does not seem that the duration of your fitting period is $48$ years.

## none of this fits uses the good duration?
library(Renext)
## matches (nearly) the computed quantiles labelled 'ismev'
fRen1 <- Renouv(x = potvalues, threshold = 50, effDuration = 96, 
                distname.y = "gpd")
predict(fRen1, newdata = c(2, 5, 10, 20, 50, 100))

##  matches (nearly) the computed quantiles labelled 'evir' and 'pot'
fRen2 <- Renouv(x = potvalues, threshold = 50, effDuration = 48, 
                distname.y = "gpd")
predict(fRen2, newdata = c(2, 5, 10, 20, 50, 100))

## matches (nearly) the computed quantiles labelled 'extremes'
fRen3 <- Renouv(x = potvalues, threshold = 50, effDuration = 96/365,
                distname.y = "gpd")
predict(fRen3, newdata = c(2, 5, 10, 20, 50, 100))
Related Question