Solved – Alternatives to fitdistr for gamma in R

fittinggamma distributionr

Consider the following code in R:

data = c(603, 103, 225, 201, 1445, 1077, 1309, 6085, 469, 2309)
f = fitdistr(data, densfun="gamma")
rate = f$estimate["rate"]
loglik = f$loglik

This generates the following error:

Error in stats::optim(x = c(603, 103, 225, 201, 1445, 1077, 1309, 6085,  : 
  non-finite finite-difference value [1]
In addition: Warning messages:
1: In densfun(x, parm[1], parm[2], ...) : NaNs produced
2: In densfun(x, parm[1], parm[2], ...) : NaNs produced

Other posts here indicate that this is due to the extreme scaling of the data leading to unwanted 0's and infinities in the optimization routine:

https://stackoverflow.com/questions/15974773/errors-while-trying-to-fit-gamma-distribution-with-r-fitdistrmass

I note from these posts that the following workaround (scaling the data, then scaling back the rate) is possible:

f = fitdistr(data / 10, densfun="gamma")  ## scaled the data by 0.1
rate = f$estimate["rate"] / 10  ## scaled the rate by 0.1 - all good!

However it's vital to the future of mankind that I accurately calculate the log-likelihood of the fit too. Unfortunately the scaling of the data affects the loglik calculation and I cannot find a way to calculate it for the unscaled dataset:

loglik = f$loglik $$ THIS IS WRONG NOW!! :(

Can you suggest:

1) Parameters to fitdistr which might work around the problem?

2) An alternative distribution fitting library for R which might not suffer from the original problem?

3) A quick and easy alternative approach in a non-R environment to do the same job? (Python, Matlab etc)

Best Answer

I would definitely recommend scaling your data, probably even with 1000 and not just 10. I can't think of any reason why it would be necessary (let alone vital to the future of mankind) to compute the log-likelihood on the original scale rather than the transformed scale. Simply do all calculations on the transformed scale and you are fine.

Having said that it is easy to simply call sum(dgamma(..., log = TRUE)) with the transformed parameters to obtain the log-likelihood on the desired scale. For illustration let's look at the fit on the scale divided by 10 as you did and divided by 1000 as I would recommend:

## scaled data
d10 <- data/10
d1000 <- data/1000
## corresponding fitdistr
f10 <- fitdistr(d10, "gamma")
f1000 <- fitdistr(d1000, "gamma")

Then we can look at the shape parameters which essentially agree

s10 <- coef(f10)["shape"]
s1000 <- coef(f1000)["shape"]
c(s10, s1000)
##     shape     shape 
## 0.8971837 0.8973881 

And then we compute the rate parameters on the scale divided by 10 and they also essentially agree:

r10_10 <- coef(f10)["rate"]
r10_1000 <- coef(f1000)["rate"]/100
c(r10_10, r10_1000)
##       rate       rate 
## 0.00648812 0.00649091 

And then we can compute the log-likelihoods by hand or extract them from f10.

l10_10 <- sum(dgamma(d10, shape = s10, rate = r10_10, log = TRUE))
l10_1000 <- sum(dgamma(d10, shape = s1000, rate = r10_1000, log = TRUE))
c(l10_10, l10_1000, logLik(f10))
## [1] -59.25158 -59.25158 -59.25158

They also essentially agree but the one from the fit on the scale divided by 1000 leads to a very slightly higher (= better) log-likelihood. This reflects that the numerical properties are better on this scale as both parameters (and their standard errors) are in the same order of magnitude.

c(l10_10, logLik(f10)) < l10_1000
## [1] TRUE TRUE

To conclude we can also compute the log-likelihood on the original scale:

r1_10 <- coef(f10)["rate"]/10
r1_1000 <- coef(f1000)["rate"]/1000
l1_10 <- sum(dgamma(data, shape = s10, rate = r1_10, log = TRUE))
l1_1000 <- sum(dgamma(data, shape = s1000, rate = r1_1000, log = TRUE))
c(l1_10, l1_1000)
## [1] -82.27743 -82.27743

Of course, this differs from logLik(f10) and logLik(f1000) just by constants and all inference (e.g., likelihood ratio tests) or information criteria would lead to idential conclusions on any of the scales.

Related Question