Solved – the definition of one-sample Kolmogorov–Smirnov test

kolmogorov-smirnov testpythonstatistical significancetime series

I have a time series data set of photon arrival times from a detector and need to know whether the arrival time is uniform.It is a continuous distribution?

I have calculated the maximum $D$ between the normalized CDF of the photon arrival times. Then what should I do? There is a "$Pr(k\le x)$" in the Wiki link. What does it stand for in my question? Could somebody tell me the basic definition of ks-test?

In fact, I can calculate ks-test probability via scipy.stats.kstest. Does anybody know the meaning of args for a uniform distribution? And the two output values?

>>> stats.kstest(sample, 'uniform',args=(1,2,3,4,5,6)) 
(1.0, 0.0) 
>>> stats.kstest(sample, 'uniform',args=(0.1,0.2,0.3,0.4,0.5,0.6)) 
(1.0, 0.0) 
>>> stats.kstest(sample, 'uniform',args=(0.1,0.2)) 
(1.0, 0.0) 
>>> stats.kstest(sample, 'uniform',args=(1,2)) 
(0.98999999999999999, 0.0) 
>>> stats.kstest(sample, 'uniform',args=(1.1,2.1)) 
(0.98499999999999999, 0.0) 

about uniform
I think I should compare two cdf:the normalised real data and the cdf of a uniform distribution that the photons arrive uniformly.

Please take a look at my plot.The green points are from real data.There are hundreds of green points.Every green point means an arrival of a new photon.x-axis is time and y-axis is normalised,in fact,it is the percent(cumulative arrival photons/total photon number).The red straight line stands for the cdf of uniform distribution.
stats.uniform picks random values every time.Is stats.uniform appropriate?
enter image description here

about ks-test
I took an image of the 2rd version Numerical Recipes in C.
What is the relation between 14.3.9 and wikipedia's Pr(k<=x)?Just an approximation,right?
The significance should be decided only by two values max(D) and sample size,right?
Pr(k<=x) is the cdf of D?How to define x?
I can not get a consistent result with stats.kstest().
You mean if I use the real time data which is not normalised,I should use two-sample ks-test,right?
NumerialRecipes2rdCh14

Best Answer

First, on the programing side, passing 'uniform' is essentially passing scipy.stats.uniform.cdf() to kstest. So whatever you have in args= will be passed scipy.stats.uniform.cdf() as parameters, which only takes two parameters, location and scale (see the document for detail). If you have more than two values in args=, the extra will simply ignored:

>>> a=np.random.random(10)
>>> stats.kstest(a, 'uniform', args=(0.5,1,3,4))
(0.303993262358352, 0.25725219759419549)
>>> stats.kstest(a, 'uniform', args=(0.5,1,300, 400)) #see how these two give same result
(0.303993262358352, 0.25725219759419549)

Second, since you already normalized CDF of the photon arrival times, it will make sense to do one-sample KS test against the standard uniform distribution. http://journals.ametsoc.org/doi/abs/10.1175/1520-0450%281975%29014%3C1600%3AANOTPM%3E2.0.CO%3B2 Basically what that paper says is that if If one or more parameters must be estimated from the sample, then $D$ no longer follows a Kolmogrov-Smirnov distribution and if you still the CDF of KS to get $P$ value from $D$, you will get wrong $P$. Also, I don't think it is correct apporach to generate a uniform distributed random variable and apply 2-sample KS test.

Third, the CDF of Kolmogrov-Smirnov distribution is given by: $\operatorname{Pr}(K\leq x)=1-2\sum_{k=1}^\infty (-1)^{k-1} e^{-2k^2 x^2}=\frac{\sqrt{2\pi}}{x}\sum_{k=1}^\infty e^{-(2k-1)^2\pi^2/(8x^2)}$ and this is how you can calculate $P$ from $D$. In scipy it is not provided by pure python code, but by a C extension.