Hypothesis Testing – Addressing Kolmogorov-Smirnov Test Failures for Normal Distribution Samples

hypothesis testingkolmogorov-smirnov testnormality-assumptionr

I created a sample with 10000 normally distributed numbers. Subsequently, I used the Kolmogorov-Smirnov test to check if they are indeed normally distributed, and it turned out that they are not. How is this possible?

Below is my code.

data <- rnorm(n=10000, 5, 2)
ks.test(data, "pnorm")

And this is the answer:

Exact one-sample Kolmogorov-Smirnov test

data: data
D = 1, p-value < 2.2e-16
alternative hypothesis: two-sided

Best Answer

As highlighted in the comments (Alex J and COOLSerdash), there are two issues here. First, the model used under the KS test is different from the true model that generated the data. The correct way would be either

> set.seed(12)
> set.seed(30823)
> data <- rnorm(n=10000, 5, 2)
> ks.test(data, "pnorm", mean=5, sd=2)

    Asymptotic one-sample Kolmogorov-Smirnov test

data:  data
D = 0.0044899, p-value = 0.9877
alternative hypothesis: two-sided

or

> data1 <- rnorm(n=10000)
> ks.test(data1, "pnorm")

    Asymptotic one-sample Kolmogorov-Smirnov test

data:  data1
D = 0.01079, p-value = 0.1947
alternative hypothesis: two-sided

Second (a minor issue), the test if used at level 0.05 has still (approximately) 5% of a chance to reject the null even if the null is true.