MATLAB: KSTEST2 P-Value Calculation, how does Matlab do it

kstest2MATLABstatistics

Hi All. I am currently using the KSTEST2 function in Matlab 2014B to compare two datasets. I understand finding the D statistic ( largest vertical difference between the Empirical CDFs), and also how to reject using the D_Alpha method (D_Alpha = 1.36*sqrt(n1+n2/n1*n2) at 5% sig, and reject when D is less than D_Alpha). But I do not understand how the p_value is calculated for either 1 or 2 sided tests. From the matlab function I have found that:
% Compute the asymptotic P-value approximation and accept or
% reject the null hypothesis on the basis of the P-value.
%


n1 = length(x1);
n2 = length(x2);
n = n1 * n2 /(n1 + n2);
lambda = max((sqrt(n) + 0.12 + 0.11/sqrt(n)) * KSstatistic , 0);
if tail ~= 0 % 1-sided test.
pValue = exp(-2 * lambda * lambda);
else % 2-sided test (default).
%
% Use the asymptotic Q-function to approximate the 2-sided P-value.
%
j = (1:101)';
pValue = 2 * sum((-1).^(j-1).*exp(-2*lambda*lambda*j.^2));
pValue = min(max(pValue, 0), 1);
end
H = (alpha >= pValue);
From this I figured first Matlab is taking the size of the two datasets, n1,n2 then calculating n (where is this equation from?). Then lambda is calculated, and has to be >=0. 1 tail is straight forward equation, where does it come from? 2 tailed is more complex, why is there j = (1:101) when the equation for the p-value only gives one answer when j = 1 (well at least in the examples I tried). Also where does this equation come from?
Basically which paper did Matlab base their script on? Where did these equations come from as I have looked at a lot of literature on this and have not found a similar equation.

Best Answer

After a lot of searching I found some good resources on the subject. 'Numerical Recipes - The Art of Scientific Computing' by Press et all, Page 736-740, 2007 which references (but is harder to understand) Use of the Kolmogorov-Smirnov, Cramer-Von Mises and Related Statistics Without Extensive Tables by Stephens, Page 115-122 from the Journal of the Royal Statistical Society. Series B (Methodological), 1970.
Just thought I would put this here incase anyone else wants to understand the KS Test better. Also check out COMPARING DISTRIBUTIONS: THE TWO-SAMPLE ANDERSON-DARLING TEST AS AN ALTERNATIVE TO THE KOLMOGOROV-SMIRNOFF TEST for more info on the limitations of the test.