Solved – Goodness of fit (cdf: empirical vs theoretical)

cumulative distribution functionempirical-cumulative-distr-fngamma distributionhypothesis testingMATLAB

I have a data-set with n = 90, probably follows the gamma distribution (and others). I used the maximum-likelihood estimation (MLE) to estimated the alpha and beta parameters of the gamma distribution using Matlab.

enter image description here

What is the best way to test the fit (goodness of fit) of the gamma distribution with the estimated parameters versus the original data-set ?

Can I compare the cumulative distribution function (cdf) – empirical vs theoretical ?

empirical_cdf = ecdf ( data set )

theoretical_cdf = cdf ( gammafit )

And make same test, for example the KS two samples

kstest2 ( empirical_cdf, theoretical_cdf )

Is this the correct way ?

Many thanks


The histogram in the last question is only a example of 1 data-set (1 of 10000).
I'll rephrase my question, I have a total of 10000 data-sets, and I wonder if the Gamma distribution is better (in terms of goodness-of-fit) that Weibull distribution for example.

or

For a data-set of 10000 what percentage fit better to gamma, and what percentage fit better to Weibull distribution ?

As you can see my data-set is big, and impossible to check one-by-one.

What is the best way to do the goodness-of-fit to found this percentages ?

Many thanks

Best Answer

I don't use matlab, but how about we check the documentation of the function. It says:

kstest2

Kolmogorov-Smirnov test to compare the distribution of two samples

So no, that's not used for comparing a fitted distribution to a sample.

What about kstest? Well, if we check the documentation there, the answer is still no:

The Kolmogorov-Smirnov test requires that cdf be predetermined. It is not accurate if cdf is estimated from the data.

That pretty much covers it. There's a Lilliefors test (matlab has a function for the normal case, mentioned in the documentation for kstest). You could do something similar to that by simulating the distribution of the test statistic.

But often people test goodness of fit in situations in which it's not really useful to do so. (This may be the case here as well - why are you testing goodness of fit?)