1) The null hypothesis is that the data is distributed according to the theoretical distribution.
2) Let $N$ be your sample size, $D$ be the observed value of the Kolmogorov-Smirnov test statistic, and define $\lambda = D(0.12 + \sqrt{N} + 0.11 / \sqrt{N})$. Then the p-value for the test statistic is approximately:
$Q = 2 \sum_{j=1}^{\infty}(-1)^{j-1}\exp\{-2j^2\lambda^2\}$
Obviously you can't calculate the infinite sum, but if you sum over 100 values or so this will get you very, very, very close. This approximation is quite good even for small values of $N$, as low as 5 if I recall correctly, and gets better as $N$ increases. Note, however, that @whuber in comments proposes a better approach.
This is a perfectly reasonable alternative to the Shapiro-Wilk test I suggested in answer to your other question, by the way. Shapiro-Wilk is more powerful, but if your sample size is in the high hundreds, the Kolmogorov-Smirnov test will have quite a bit of power too.
There are two points that are confused. The first one is about words "exact" and "approximate" in a statistical context. The word "exact" means that while calculations are carried out, no simplifications are used. The "approximate" p-value does not mean that the value is rounded to some precision. It means that while calculating it, some simplifications have been used. However, both "exact" and "approximate" calculations give precise numerical values. It is only our confidence that may differ. And now the second point: it is just the way of formatting output that gives you non-precise values. Actually, you are invoking the same output in different ways.
ks.test (black, red, alternative="l")$p.value
ks.test (black, red, alternative="g")$p.value
ks.test (black, red)$p.value
all give you precise (not rounded) values because you are calling the value of variables. In the last case the p-value is so small that it is lower than machine precision, and thus is listed as 0.
But, when you are just calling a function, the function gives you human-readable output. During preparing this output, the p-values are passing through the format.pval()
function.
First of all, check the consistency of ks.test (black, red)
and ks.test (black, red, alternative="g")
- the p-values are the same in the non-precise format.
And now compare
ks.test (black, red, alternative="g")$p.value
and
format.pval(ks.test (black, red, alternative="g")$p.value)
Now is it clear how that p-value < 2.2e-16
is produced?
And finally about ks.boot()
. It uses bootstrapping. While ks.test()
obtains the probability of test statistics from the Kolmogorov distribution (this distribution describes how test statistics are distributed when two samples really are drawn from the same distribution), ks.boot()
obtains the probability of test statistics from an empirical distribution, derived under the null hypothesis. That is, the studied two samples are combined together and from this united set two new samples are drawn at random with replacement. These new samples for sure are drawn from the same distribution, and their test statistics is noted. Repeating such procedure many times, we obtain the empirical distribution of test statistics under the null hypothesis. The number of repeats you are doing is in nboots
variable in ks.boot()
output. You have used default value of 1000. In this way, you have simulated 1000 test statistics values under the null hypothesis. You actual test statistics is greater than all these 1000. That means that p-value at least is equal or lesser than 0.001 - that is ks.boot.p.value
. Call ks.boot(red,black,nboots=10000)
and you'll obtain ks.boot.p.value=0.0001
. To obtain a reasonable p-value with ks.boot()
your nboots
should have larger (absolute) order than expected p-value do have (i.e. more than $10^{23}$). I recommend you not do this, since it'll hang up your computer or will throw memory exception. Actually, the precise p-values of such small order have no any practical usage. Indeed, they are very sensitive to small changes in data, and thus repeated experiments would result in largely different p-values, so it can be said that the less p-value is - the less confidence to it precise value should be given.
Best Answer
Based on (Hair et al., 1998), when observations are above 1000 the K.S test becomes highly sensitive which means small deviations from normality will result in p values below .05 and thus rejecting the normality. Thus for above 1000 observations it is suggested to use graphical tests as well. Try
qqPlot
andhist
to graphically see if data is normal or not.