Solved – Distribution fitted well but kolmogorov -smirnov test not showing right results? Am I doing it right

distributionskolmogorov-smirnov testMATLAB

I want to do Kolmogorov -Smirnov test to see whether my data follows a particular distribution or not? When I fitted my data to lognormal distribution it fitted well. But When I am doing ks test, it is rejecting the null hypothesis. Also, my sample size is very large like 1047304 samples. So I took 1000 samples where I thought both empirical and observed data are mostly correlated. The thing is

1) I haven't studied any statistics. In wiki, they gave that the Kolmogorov–Smirnov statistic quantifies a distance between the empirical distribution function of the sample and the cumulative distribution function of the reference distribution. As far as I understood, for each x value, we measure corresponding F(x) or cumulative probability values and these values vary according to the type of distribution we choose. We measure the distance between F(X) values of theoretical distribution and the reference distribution. Am I right?

2) I have used following code in Matlab for kstest

cdf2=logncdf(area1out,-3.60186,0.347719);
cdfplot(area1out)
hold on;
cdfplot(cdf2)
[h,p,ksstat2]=kstest2(area1out,cdf2);

'area1out' is my data. The values -3.60186 and 0.347719 are the parameters estimated after fitting the data to the lognormal distribution. The result is h=1, p=0, and ksstat=0.9297

3) I got the output as follows

This figure shows the good fit of the lognormal distribution

enter image description here

The second picture is showing the difference between both CDFs I guess.
enter image description here

When I took 1000 samples the output is like this.

I am feeling like I am thinking wrong and making a big blunder. please help me to understand ks test practically. I will read the whole theory behind it but help me to understand this in a simple way as I am very new to this and had to work on this problem.

EDIT: No matter how many different combinations of samples and (with small sample size) I have taken, the distance is very large. When I zoomed my data it is like this and the corresponding test results are in 2nd figure.
enter image description here
enter image description here

Is there a problem with the parameter values I have taken to estimate the CDF? Or my data really not fitted lognormal distribution?

Best Answer

I would always advise against using any statistical test that tries to quantify whether two distributions are similar, such as the Kolmogorov-Smirnov Test. Because it evaluates whether the difference from a sample distribution and an empirical one is significantly different from 0, it does not have enough power in small sample sizes; thus even if the distribution is not from the empirical one the test would not be significant.

If the sample size is too large, such as in your case, KS will almost always be significant, even for a slight deviation from the empirical distribution.

Here is a more elaborate description: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4342197/

Instead, I would just visually inspect the distribution through a histogram or normal Q-Q plot.

TL:DR

Don´t use the KS-Test; visually inspect the data instead.