Solved – Why does fitdistr does not work with gamma

datasetdensity functiongamma distributionr

I am tring to find a probability density for some waiting time, but I am having a hard time. Fitdistr does not work with Gamma. Am I missing something?

Is there another density that fits better to this?

library(MASS) 

wt = read.csv("http://isgeek.eu/SITN/waitingTime.csv", sep="\t", header = TRUE)
hist(wt$x, prob=TRUE)

# Estimate an exp proba
paraw <- fitdistr(wt$x,densfun="exponential")
curve(dexp(x, paraw$estimate[1]), 0,1300, add=TRUE, col="blue")
ks.test(wt$x, "pexp", paraw$estimate[1]) 

# Estimate a weilbull proba
paraw <- fitdistr(wt$x[wt$x!=0],densfun="weibull")
curve(dweibull(x, paraw$estimate[1], paraw$estimate[2]), 0,4000, add=TRUE, col="blue")
ks.test(wt$x, "pweibull", paraw$estimate[1], paraw$estimate[2]) 

# Estimate a gamma proba
paraw <- fitdistr(wt$x,densfun="gamma", list(shape = 1, rate = 0.1), lower = 40)
# this does not work...

Best Answer

per comments: it seems to work when you take out the 0's (see red curve below). Also you may consider adjusting the "lower" parameter to a decimal or an interval.

#changed "lower" parameter 
paraw <- fitdistr(wt$x[wt$x!=0], densfun="gamma", list(shape = 1, rate = 0.1), 
                  lower = 0.4)
curve(dgamma(x, paraw$estimate[1], paraw$estimate[2]), 0,4000, add=TRUE, 
      col="red")
ks.test(wt$x, "pgamma", paraw$estimate[1], paraw$estimate[2]) 

# One-sample Kolmogorov-Smirnov test
# 
# data:  wt$x
# D = 0.4372, p-value < 2.2e-16
# alternative hypothesis: two-sided


# default: maybe a good estimate
paraw <- fitdistr(wt$x[wt$x!=0], densfun="gamma")
curve(dgamma(x, paraw$estimate[1], paraw$estimate[2]), 0,4000, add=TRUE, 
      col="gray")
ks.test(wt$x, "pgamma", paraw$estimate[1], paraw$estimate[2]) 

# One-sample Kolmogorov-Smirnov test
# 
# data:  wt$x
# D = 0.0351, p-value = 0.1633
# alternative hypothesis: two-sided

gamma2

Related Solutions

Solved – Why does this data throw an error in R fitdistr

The Weibull distribution has two parameters, the scale $\lambda$ and shape $k$ (I'm following Wikipedia's notation). Both parameters are positive real numbers.

The function fitdist from the fitdistrplus package uses the optim function to find the maximum likelihood estimations of the parameters. By default, optim imposes no constraints on the parameters and tries out negative numbers as well. But negative values for the scale or shape produce NaNs for the Weibull distribution. By using the options lower and upper, you can impose limits on the parameter search space for optim.

The gamma distribution also has two parameters and as with the Weibull distribution, both are positive. So the same limits lower = c(0, 0) can be used for the gamma distribution.

Edit

Here is a small comparison of the Weibull and gamma fit for the posted data. The errors for the gamma distribution arise because of bad starting values. I provide them manually and then it works fine without errors.

library(fitdistrplus)

temp <- c(477.25, 2615.56, 1279.98, 581.57, 13.55, 80.4, 6640.22, 759.46, 
          1142.33, 134, 1232.23, 389.81, 7811.65, 992.11, 1152.4, 3139.01, 
          2636.78, 3294.75, 2266.95, 32.12, 7356.84, 1448.54, 3606.82, 
          465.39, 950.5, 3721.49, 522.01, 1548.62, 2196.3, 256.8, 2959.72, 
          214.4, 134, 2307.79, 2112.74)

fit.weibull <- fitdist(temp, distr = "weibull", method = "mle", lower = c(0, 0))
fit.gamma <- fitdist(temp, distr = "gamma", method = "mle", lower = c(0, 0), start = list(scale = 1, shape = 1))

Plot the fit for the Weibull:

plot(fit.weibull)

Weibull fit

And for the gamma distribution:

plot(fit.gamma)

Gamma fit

They are practically indistinguishable. The AICs are virtually the same for both fits:

gofstat(list(fit.weibull, fit.gamma))

Goodness-of-fit statistics
                             1-mle-weibull 2-mle-gamma
Kolmogorov-Smirnov statistic    0.07288424  0.07970184
Cramer-von Mises statistic      0.02532353  0.02361358
Anderson-Darling statistic      0.20489012  0.17609146

Goodness-of-fit criteria
                               1-mle-weibull 2-mle-gamma
Aikake's Information Criterion      601.7909    601.5659
Bayesian Information Criterion      604.9016    604.6766

Solved – Alternatives to fitdistr for gamma in R

I would definitely recommend scaling your data, probably even with 1000 and not just 10. I can't think of any reason why it would be necessary (let alone vital to the future of mankind) to compute the log-likelihood on the original scale rather than the transformed scale. Simply do all calculations on the transformed scale and you are fine.

Having said that it is easy to simply call sum(dgamma(..., log = TRUE)) with the transformed parameters to obtain the log-likelihood on the desired scale. For illustration let's look at the fit on the scale divided by 10 as you did and divided by 1000 as I would recommend:

## scaled data
d10 <- data/10
d1000 <- data/1000
## corresponding fitdistr
f10 <- fitdistr(d10, "gamma")
f1000 <- fitdistr(d1000, "gamma")

Then we can look at the shape parameters which essentially agree

s10 <- coef(f10)["shape"]
s1000 <- coef(f1000)["shape"]
c(s10, s1000)
##     shape     shape 
## 0.8971837 0.8973881

And then we compute the rate parameters on the scale divided by 10 and they also essentially agree:

r10_10 <- coef(f10)["rate"]
r10_1000 <- coef(f1000)["rate"]/100
c(r10_10, r10_1000)
##       rate       rate 
## 0.00648812 0.00649091

And then we can compute the log-likelihoods by hand or extract them from f10.

l10_10 <- sum(dgamma(d10, shape = s10, rate = r10_10, log = TRUE))
l10_1000 <- sum(dgamma(d10, shape = s1000, rate = r10_1000, log = TRUE))
c(l10_10, l10_1000, logLik(f10))
## [1] -59.25158 -59.25158 -59.25158

They also essentially agree but the one from the fit on the scale divided by 1000 leads to a very slightly higher (= better) log-likelihood. This reflects that the numerical properties are better on this scale as both parameters (and their standard errors) are in the same order of magnitude.

c(l10_10, logLik(f10)) < l10_1000
## [1] TRUE TRUE

To conclude we can also compute the log-likelihood on the original scale:

r1_10 <- coef(f10)["rate"]/10
r1_1000 <- coef(f1000)["rate"]/1000
l1_10 <- sum(dgamma(data, shape = s10, rate = r1_10, log = TRUE))
l1_1000 <- sum(dgamma(data, shape = s1000, rate = r1_1000, log = TRUE))
c(l1_10, l1_1000)
## [1] -82.27743 -82.27743

Of course, this differs from logLik(f10) and logLik(f1000) just by constants and all inference (e.g., likelihood ratio tests) or information criteria would lead to idential conclusions on any of the scales.

Best Answer

Related Solutions

Solved – Why does this data throw an error in R fitdistr

Solved – Alternatives to fitdistr for gamma in R

Related Question