The Weibull distribution has two parameters, the scale $\lambda$ and shape $k$ (I'm following Wikipedia's notation). Both parameters are positive real numbers.
The function fitdist
from the fitdistrplus
package uses the optim
function to find the maximum likelihood estimations of the parameters. By default, optim
imposes no constraints on the parameters and tries out negative numbers as well. But negative values for the scale or shape produce NaNs
for the Weibull distribution. By using the options lower
and upper
, you can impose limits on the parameter search space for optim
.
The gamma distribution also has two parameters and as with the Weibull distribution, both are positive. So the same limits lower = c(0, 0)
can be used for the gamma distribution.
Edit
Here is a small comparison of the Weibull and gamma fit for the posted data. The errors for the gamma distribution arise because of bad starting values. I provide them manually and then it works fine without errors.
library(fitdistrplus)
temp <- c(477.25, 2615.56, 1279.98, 581.57, 13.55, 80.4, 6640.22, 759.46,
1142.33, 134, 1232.23, 389.81, 7811.65, 992.11, 1152.4, 3139.01,
2636.78, 3294.75, 2266.95, 32.12, 7356.84, 1448.54, 3606.82,
465.39, 950.5, 3721.49, 522.01, 1548.62, 2196.3, 256.8, 2959.72,
214.4, 134, 2307.79, 2112.74)
fit.weibull <- fitdist(temp, distr = "weibull", method = "mle", lower = c(0, 0))
fit.gamma <- fitdist(temp, distr = "gamma", method = "mle", lower = c(0, 0), start = list(scale = 1, shape = 1))
Plot the fit for the Weibull:
plot(fit.weibull)
And for the gamma distribution:
plot(fit.gamma)
They are practically indistinguishable. The AICs are virtually the same for both fits:
gofstat(list(fit.weibull, fit.gamma))
Goodness-of-fit statistics
1-mle-weibull 2-mle-gamma
Kolmogorov-Smirnov statistic 0.07288424 0.07970184
Cramer-von Mises statistic 0.02532353 0.02361358
Anderson-Darling statistic 0.20489012 0.17609146
Goodness-of-fit criteria
1-mle-weibull 2-mle-gamma
Aikake's Information Criterion 601.7909 601.5659
Bayesian Information Criterion 604.9016 604.6766
I would definitely recommend scaling your data, probably even with 1000
and not just 10
. I can't think of any reason why it would be necessary (let alone vital to the future of mankind) to compute the log-likelihood on the original scale rather than the transformed scale. Simply do all calculations on the transformed scale and you are fine.
Having said that it is easy to simply call sum(dgamma(..., log = TRUE))
with the transformed parameters to obtain the log-likelihood on the desired scale. For illustration let's look at the fit on the scale divided by 10
as you did and divided by 1000
as I would recommend:
## scaled data
d10 <- data/10
d1000 <- data/1000
## corresponding fitdistr
f10 <- fitdistr(d10, "gamma")
f1000 <- fitdistr(d1000, "gamma")
Then we can look at the shape parameters which essentially agree
s10 <- coef(f10)["shape"]
s1000 <- coef(f1000)["shape"]
c(s10, s1000)
## shape shape
## 0.8971837 0.8973881
And then we compute the rate parameters on the scale divided by 10 and they also essentially agree:
r10_10 <- coef(f10)["rate"]
r10_1000 <- coef(f1000)["rate"]/100
c(r10_10, r10_1000)
## rate rate
## 0.00648812 0.00649091
And then we can compute the log-likelihoods by hand or extract them from f10
.
l10_10 <- sum(dgamma(d10, shape = s10, rate = r10_10, log = TRUE))
l10_1000 <- sum(dgamma(d10, shape = s1000, rate = r10_1000, log = TRUE))
c(l10_10, l10_1000, logLik(f10))
## [1] -59.25158 -59.25158 -59.25158
They also essentially agree but the one from the fit on the scale divided by 1000 leads to a very slightly higher (= better) log-likelihood. This reflects that the numerical properties are better on this scale as both parameters (and their standard errors) are in the same order of magnitude.
c(l10_10, logLik(f10)) < l10_1000
## [1] TRUE TRUE
To conclude we can also compute the log-likelihood on the original scale:
r1_10 <- coef(f10)["rate"]/10
r1_1000 <- coef(f1000)["rate"]/1000
l1_10 <- sum(dgamma(data, shape = s10, rate = r1_10, log = TRUE))
l1_1000 <- sum(dgamma(data, shape = s1000, rate = r1_1000, log = TRUE))
c(l1_10, l1_1000)
## [1] -82.27743 -82.27743
Of course, this differs from logLik(f10)
and logLik(f1000)
just by constants and all inference (e.g., likelihood ratio tests) or information criteria would lead to idential conclusions on any of the scales.
Best Answer
per comments: it seems to work when you take out the 0's (see red curve below). Also you may consider adjusting the "lower" parameter to a decimal or an interval.