Consider the following code in R:
data = c(603, 103, 225, 201, 1445, 1077, 1309, 6085, 469, 2309)
f = fitdistr(data, densfun="gamma")
rate = f$estimate["rate"]
loglik = f$loglik
This generates the following error:
Error in stats::optim(x = c(603, 103, 225, 201, 1445, 1077, 1309, 6085, :
non-finite finite-difference value [1]
In addition: Warning messages:
1: In densfun(x, parm[1], parm[2], ...) : NaNs produced
2: In densfun(x, parm[1], parm[2], ...) : NaNs produced
Other posts here indicate that this is due to the extreme scaling of the data leading to unwanted 0's and infinities in the optimization routine:
I note from these posts that the following workaround (scaling the data, then scaling back the rate) is possible:
f = fitdistr(data / 10, densfun="gamma") ## scaled the data by 0.1
rate = f$estimate["rate"] / 10 ## scaled the rate by 0.1 - all good!
However it's vital to the future of mankind that I accurately calculate the log-likelihood of the fit too. Unfortunately the scaling of the data affects the loglik calculation and I cannot find a way to calculate it for the unscaled dataset:
loglik = f$loglik $$ THIS IS WRONG NOW!! :(
Can you suggest:
1) Parameters to fitdistr which might work around the problem?
2) An alternative distribution fitting library for R which might not suffer from the original problem?
3) A quick and easy alternative approach in a non-R environment to do the same job? (Python, Matlab etc)
Best Answer
I would definitely recommend scaling your data, probably even with
1000
and not just10
. I can't think of any reason why it would be necessary (let alone vital to the future of mankind) to compute the log-likelihood on the original scale rather than the transformed scale. Simply do all calculations on the transformed scale and you are fine.Having said that it is easy to simply call
sum(dgamma(..., log = TRUE))
with the transformed parameters to obtain the log-likelihood on the desired scale. For illustration let's look at the fit on the scale divided by10
as you did and divided by1000
as I would recommend:Then we can look at the shape parameters which essentially agree
And then we compute the rate parameters on the scale divided by 10 and they also essentially agree:
And then we can compute the log-likelihoods by hand or extract them from
f10
.They also essentially agree but the one from the fit on the scale divided by 1000 leads to a very slightly higher (= better) log-likelihood. This reflects that the numerical properties are better on this scale as both parameters (and their standard errors) are in the same order of magnitude.
To conclude we can also compute the log-likelihood on the original scale:
Of course, this differs from
logLik(f10)
andlogLik(f1000)
just by constants and all inference (e.g., likelihood ratio tests) or information criteria would lead to idential conclusions on any of the scales.