Solved – Why are most standard goodness of fit tests based only on continuous distributions

chi-squared-testcontinuous datadistributionsvalidation

I tried to search info regarding this fact but I don't really understand why most of the standard goodness of fit tests (e.g. Kolmogorov-Smirnov, Anderson-Darling, a part of the Chi-square test, perhaps!) work only with continuous distributions. Can someone help me?
Thank You.

Best Answer

The reason for the KS test is that its generality, e.g. it's usefulness for non-parametric models comes from the definition of the test statistic under the assumption of the CDF being continuous.

Where we define the KS Statistic as

$$D_n(F) = \max\left(D_n^+(F), D_n^-(F)\right)$$

$$D_n^+(F) = \sup_{x \in \mathbb{R}} [F_n(x) - F(x)]$$

(and the reverse for $D_n^-(F)$).

Then under the null $D_n^+(F) = \max_{0 \le i \le n} \left( F_n(x_i) - F(X_{(i)}) \right)$

Recall that under the null $F(X_{(i)})$ is continuous uniform on $(0,1)$ so the distribution of $F$ doesn't matter.

So you can create your own K-S like test for any discrete distribution, but it won't be a generalized test.

Reference/Citation, Mathematical Statistics (Shao 2010)

Related Solutions

Solved – How to test whether a sample of data fits the family of Gamma distribution

I think the question asks for a precise statistical test, not for an histogram comparison. When using the Kolmogorov-Smirnov test with estimated parameters, the distribution of the test statistics under the null depends on the tested distribution, as opposed to the case with no estimated parameter. For instance, using (in R)

x <- rnorm(100)
ks.test(x, "pnorm", mean=mean(x), sd=sd(x))

leads to

        One-sample Kolmogorov-Smirnov test

data:  x 
D = 0.0701, p-value = 0.7096
alternative hypothesis: two-sided

while we get

> ks.test(x, "pnorm")

        One-sample Kolmogorov-Smirnov test

data:  x 
D = 0.1294, p-value = 0.07022
alternative hypothesis: two-sided

for the same sample x. The significance level or the p-value thus have to be determined by Monte Carlo simulation under the null, producing the distribution of the Kolmogorov-Smirnov statistics from samples simulated under the estimated distribution (with a slight approximation in the result given that the observed sample comes from another distribution, even under the null).

Solved – “Better” goodness-of-fit tests than chi squared for histogram modeling

I'm going to venture an answer to my own question after some googling. One simple approach is to use binned Poisson maximum likelihood ratios. See p. 94-96 of this page:

http://www.hep.phy.cam.ac.uk/~thomson/lectures/statistics/FittingHandout.pdf

The likelihood ratio converges to a $\chi^2$ distribution in the large count limit, and if you're dealing with very few counts, you should do MC simulations to determine the empirical distribution of the likelihood ratio under the hypothesis that your histogram does indeed represent a collection of samples from the model distribution. You can determine a $p$-value from this simulated likelihood ratio distribution, and the $\chi^2$ test just represents a fast analytical approximation to this which is applicable in the large count limit.

All of this is nothing more than following the "perennial philosophy" of frequentist statistics:

If you think something interesting happened, you should find out how often you'd expect that thing to happen by chance before you go proclaiming to the world that it's interesting.
If the $p$-value shows that random effects are almost surely not responsible for the difference between your model and your observation, then your model is probably wrong.

Best Answer

Related Solutions

Solved – How to test whether a sample of data fits the family of Gamma distribution

Solved – “Better” goodness-of-fit tests than chi squared for histogram modeling

Related Question