Solved – Interpreting p-values of goodness-of-fit tests using resampling

distributionsgoodness of fitp-valuer

I would like to find a suitable distribution to fit to a dataset. Beyond visual analysis of histograms with overlaid density curves and Q-Q plots I would like to perform statistical tests, namely Kolmogorov-Smirnov and Anderson-Darling. As I don't know the fully specified distribution a priori, I estimate the distribution parameters from the data. However this invalidates the tests and so I simulate the test statistics a large number of times.

My issue is in interpreting the output. Here is an example using R code from the answer on How to determine which distribution fits my data best? to test the suitability of a Weibull distribution for my data:

library(logspline)
library(FAdist)
library(fitdistrplus)
library(ADGofTest)

n.sims <- 5e4 #number of simulations for KS and AD tests
x <- as.numeric(zooList$flow12001) #data vector length 973

fit.wei <- fitdist(x, "weibull")
#replicate KS
ksstats <- replicate(n.sims, {
  r <- rweibull(n = length(x), shape = fit.wei$estimate["shape"]
              , scale = fit.wei$estimate["scale"])
  as.numeric(ks.test(r, "pweibull", shape = fit.wei$estimate["shape"]
                     , scale = fit.wei$estimate["scale"])$statistic)      
})

ksfit <- logspline(ksstats)
kspval <- 1 - plogspline(ks.test(x, "pweibull", shape= fit.wei$estimate["shape"],
                       scale = fit.wei$estimate["scale"])$statistic, ksfit)
> kspval
[1] 0.2647569

#replicate A-D
adstats <- replicate(n.sims, {
  r <- rweibull(n = length(x), shape = fit.wei$estimate["shape"]
              , scale = fit.wei$estimate["scale"])
  as.numeric(ad.test(r, pweibull, shape= fit.wei$estimate["shape"]
                     , scale = fit.wei$estimate["scale"])$statistic)      
})

adfit <- logspline(adstats)
adpval <- 1 - plogspline(ad.test(x, pweibull, shape = fit.wei$estimate["shape"],
                       scale = fit.wei$estimate["scale"])$statistic, adfit)
> adpval
[1] 0.1292376

My interpretation of these results is that for repeated tests, the KS test results in a rejection of H0 ~26% of the time whilst the AD test a rejection of H0 ~13% of the time. Often hypothesis tests are compared to tables of critical values for the test statistic or associated p-values (generally p-val = 0.05), but in my example can I choose the p-value arbitrarily due to the resampling procedure?

I am aware that there exist tables for the critical values of the KS test (e.g. http://www.cas.usf.edu/~cconnor/colima/Kolmogorov_Smirnov.htm) and asymptotic AD test (http://www.cithep.caltech.edu/~fcp/statistics/hypothesisTest/PoissonConsistency/AndersonDarling1954.pdf) but to be honest I am not sure how to use these in a resampling-based methodology.

Best Answer

You are essentially looking at a distribution of p-values. As discussed in [1] and [2], p-value is a uniformly-distributed random variable when the null hypothesis is true. Note that in this case your null hypothesis is that your sampled data orginate from the reference distribution curve.

A good way to validate this fact is to perform a large number of K-S tests where you compare between resampled populations of your data instead of a reference distribution curve. You will see that indeed, when comparing sets of data that come from the same model, the p-value distribution is uniform.

Thus, you could perhaps use a uniformity test on the p-value distribution to determine its uniformness. If your uniformity test gives a p-value < 0.05 or some other critical value, then you can reject your null hypothesis.

[1] Murdoch, D, Tsai, Y, and Adcock, J (2008). P-Values are Random Variables. The American Statistician, 62, 242-245.

[2] Why are p-values uniformly distributed under the null hypothesis?

Related Question