Hi All. I am currently using the KSTEST2 function in Matlab 2014B to compare two datasets. I understand finding the D statistic ( largest vertical difference between the Empirical CDFs), and also how to reject using the D_Alpha method (D_Alpha = 1.36*sqrt(n1+n2/n1*n2) at 5% sig, and reject when D is less than D_Alpha). But I do not understand how the p_value is calculated for either 1 or 2 sided tests. From the matlab function I have found that:
% Compute the asymptotic P-value approximation and accept or
% reject the null hypothesis on the basis of the P-value.
%
n1 = length(x1);n2 = length(x2);n = n1 * n2 /(n1 + n2);lambda = max((sqrt(n) + 0.12 + 0.11/sqrt(n)) * KSstatistic , 0);if tail ~= 0 % 1-sided test.
pValue = exp(-2 * lambda * lambda);else % 2-sided test (default).
%% Use the asymptotic Q-function to approximate the 2-sided P-value.
%j = (1:101)';pValue = 2 * sum((-1).^(j-1).*exp(-2*lambda*lambda*j.^2));pValue = min(max(pValue, 0), 1);endH = (alpha >= pValue);
From this I figured first Matlab is taking the size of the two datasets, n1,n2 then calculating n (where is this equation from?). Then lambda is calculated, and has to be >=0. 1 tail is straight forward equation, where does it come from? 2 tailed is more complex, why is there j = (1:101) when the equation for the p-value only gives one answer when j = 1 (well at least in the examples I tried). Also where does this equation come from?
Basically which paper did Matlab base their script on? Where did these equations come from as I have looked at a lot of literature on this and have not found a similar equation.
Best Answer