Solved – Issues with fitting a non-standard distribution to data and initial values

curve fittingdistributionsfittingr

I'm trying to find a distribution that fits my data in order that I can then predict a 5th percentile but none of the standard distributions seem to fit.

I'll explain my approach so far with examples below. I have then been trying to fit a Burr distribution in fitdistrplus but cannot find any suitable initial values. But I am unsure it this is because this is an incorrect distribution for the data or if its me. I've also experimented with prefit function but am getting a similar problem of not being able to choose feasible starting values. Maybe there is a more appropriate distribution I haven't tried?

My data are:

0.0001900 0.0002100 0.0002200 0.0003000 0.0007800 0.0008400 0.0011000 0.0011300 0.0012000 0.0016000 0.0016000 0.0020000 0.0020000 0.0031000 0.0056500 0.0059000 0.0082449 0.0130000 0.0180000 0.0191000 0.0510000

The Cullen and Frey graph using the following code is as follows:

descdist(Data, boot = 500)

enter image description here

from this I thought the Beta distribution may work best but it isn't quite right. I can only post 2 images so have only included the QQ plot here:

fitln <- fitdist(SSD2$NOEC,"lnorm")
fitW <- fitdist(SSD2$NOEC, "weibull")
fitg <- fitdist(SSD2$NOEC, "gamma")
fitn <- fitdist(SSD2$NOEC, "norm")
fitexp <- fitdist(SSD2$NOEC,"exp")
fitB <- fitdist(SSD2$NOEC,"beta")
fitP <- fitdist(SSD2$NOEC,"pareto")

cdfcomp(list(fitW, fitg, fitln, fitn, fitexp, fitB, fitP), 
    legendtext=c("Weibull", "gamma", "lognormal", "norm", "exp", "Beta", "Pareto"))
denscomp(list(fitW, fitg, fitln, fitn, fitexp, fitB, fitP), 
     legendtext=c("Weibull", "gamma", "lognormal", "norm", "exp", "Beta", "Pareto"))
qqcomp(list(fitW, fitg, fitln, fitn, fitexp, fitB, fitP), 
   legendtext=c("Weibull", "gamma", "lognormal", "norm", "exp", "Beta", "Pareto"))

enter image description here

ppcomp(list(fitW, fitg, fitln, fitn, fitexp, fitB, fitP), 
   legendtext=c("Weibull", "gamma", "lognormal", "norm", "exp", "Beta", "Pareto"))
gofstat(list(fitW, fitg, fitln, fitn, fitexp, fitB, fitP))

Goodness-of-fit statistics
                             1-mle-weibull 2-mle-gamma 3-mle-lnorm 4-mle-norm 5-mle-exp 6-mle-beta
Kolmogorov-Smirnov statistic    0.17899327   0.2201135  0.13002018  0.2888904 0.3552864  0.2208923
Cramer-von Mises statistic      0.08976409   0.1567659  0.04560304  0.5880214 0.6158410  0.1589649
Anderson-Darling statistic      0.53379880   0.8341498  0.30557002  3.1553792 3.5184364  0.8457680
                             7-mle-pareto
Kolmogorov-Smirnov statistic   0.12410532
Cramer-von Mises statistic     0.04296412
Anderson-Darling statistic     0.29429372

Goodness-of-fit criteria
                               1-mle-weibull 2-mle-gamma 3-mle-lnorm 4-mle-norm 5-mle-exp 6-mle-beta
Akaike's Information Criterion     -173.8742   -171.8001   -177.3758  -124.3444 -167.3059  -171.7056
Bayesian Information Criterion     -171.7851   -169.7111   -175.2868  -122.2554 -166.2614  -169.6166
                               7-mle-pareto
Akaike's Information Criterion    -176.0548
Bayesian Information Criterion    -173.9658

The code I have used so far to try burr distribution is as follows but am struggling with the initial values – I have tried a few variations of shape 1 and 2:

fitBurr <- fitdist(SSD2$NOEC,"burr", start = list(shape1 = 1, shape2 = 3, rate = 1))
prefit(SSD2$NOEC,"burr", method = "mle", start = list(shape1 = 1, shape2 = 3))

Any help in getting a suitable distribution for this data would be greatly appreciated.

Best Answer

I tried beta and got what I think was a good result:

Parameters: a = 1.6589757232166452E-01 b = 1.0014191317056027E+00 location = 1.8999999999999998E-04 scale = 5.2536884151069704E-02

the scipy documentation for beta is here for the distribution details: http://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.beta.html

I fit your data to the 80+ continuous statistical distributions in scipy using an open source statistical distribution fitter I had written years ago. The beta distribution was indeed at the top of the results list.

beta

Related Question