MATLAB: KSTEST2 P-Value Calculation, how does Matlab do it

Hi All. I am currently using the KSTEST2 function in Matlab 2014B to compare two datasets. I understand finding the D statistic ( largest vertical difference between the Empirical CDFs), and also how to reject using the D_Alpha method (D_Alpha = 1.36*sqrt(n1+n2/n1*n2) at 5% sig, and reject when D is less than D_Alpha). But I do not understand how the p_value is calculated for either 1 or 2 sided tests. From the matlab function I have found that:

% Compute the asymptotic P-value approximation and accept or
% reject the null hypothesis on the basis of the P-value.
%


n1     =  length(x1);
n2     =  length(x2);
n      =  n1 * n2 /(n1 + n2);
lambda =  max((sqrt(n) + 0.12 + 0.11/sqrt(n)) * KSstatistic , 0);
if tail ~= 0        % 1-sided test.
pValue  =  exp(-2 * lambda * lambda);
else                % 2-sided test (default).
%
%  Use the asymptotic Q-function to approximate the 2-sided P-value.
%
j       =  (1:101)';
pValue  =  2 * sum((-1).^(j-1).*exp(-2*lambda*lambda*j.^2));
pValue  =  min(max(pValue, 0), 1);
end
H  =  (alpha >= pValue);

From this I figured first Matlab is taking the size of the two datasets, n1,n2 then calculating n (where is this equation from?). Then lambda is calculated, and has to be >=0. 1 tail is straight forward equation, where does it come from? 2 tailed is more complex, why is there j = (1:101) when the equation for the p-value only gives one answer when j = 1 (well at least in the examples I tried). Also where does this equation come from?

Basically which paper did Matlab base their script on? Where did these equations come from as I have looked at a lot of literature on this and have not found a similar equation.

Best Answer

After a lot of searching I found some good resources on the subject. 'Numerical Recipes - The Art of Scientific Computing' by Press et all, Page 736-740, 2007 which references (but is harder to understand) Use of the Kolmogorov-Smirnov, Cramer-Von Mises and Related Statistics Without Extensive Tables by Stephens, Page 115-122 from the Journal of the Royal Statistical Society. Series B (Methodological), 1970.

Just thought I would put this here incase anyone else wants to understand the KS Test better. Also check out COMPARING DISTRIBUTIONS: THE TWO-SAMPLE ANDERSON-DARLING TEST AS AN ALTERNATIVE TO THE KOLMOGOROV-SMIRNOFF TEST for more info on the limitations of the test.

Best Answer

Related Solutions

MATLAB: Does KSTEST not reject the null hypothesis when the P-value is less than the significance level in Statistics Toolbox 6.2 (R2008a)

MATLAB: Getting an error “INDEX OUT OF BOUNDS” in the matlab program. Please help.

Related Question