Kolmogorov-Smirnov Test – How to Use Kolmogorov-Smirnov Test for Assessing Normality of a Random Variable

goodness of fitkolmogorov-smirnov testnormal distributionnormality-assumptionr

Context

I am confused by the following post where the accepted answer states that :

You can't really even compare the two since the Kolmogorov-Smirnov is
for a completely specified distribution (so if you're testing
normality, you must specify the mean and variance; they can't be
estimated from the data*), while the Shapiro-Wilk is for normality,
with unspecified mean and variance.

  • you also can't standardize by using estimated parameters and test for standard normal; that's actually the same thing.

Question

Imagine that I have a random sample of measurements X which I standardise using its sample mean and variance. May I use the Kolmogorov Smirnov test as a GOF test to assess normality of this random sample ?

$$ H_0 : X_\text{scaled} \sim N(0,1) $$

Illustration

To illustrate my question here is a code snippet in R :

# We wish to do a Goodness of Fit test that X is a random sample from a Normal Distribution N(mu,sigma^2)
X <- c(10.212, 10.103, 10.242, 10.106, 10.102, 10.095, 10.042, 10.093, 10.302, 10.111)
sample.mean <- mean(X)
sample.variance <- var(X)
# Or that standardized X (scaled.X) is a random sample from a standard normal distribution N(0,1)
scaled.X <- (X-sample.mean)/(sqrt(sample.variance))


# Kolmogorov-Smirnov Test H0 : X ~ N(0,1)

ks.test(scaled.X,alternative="two.sided",y = "pnorm")
# Do not reject the null.
# Shapiro Test 
shapiro.test(scaled.X)
# Do reject the null.

Note that the KS test and the Shapiro-Wilk test gives contradictory results, hinting towards the Shapiro Wilk test being more powerful in this specific case. This is however not my main question although any comments on this is gladly welcomed.

The specific area of interest of this question is if using the KS test on a standardized random sample (with sample statistics) a sound way to evaluate the normality assumption.

Best Answer

Your approach is Procrustean: when you standardized the data, you forced them to look a little more like standard Normal values than they had. After all, part of detecting a difference in distribution involves comparing their means and variances, which you have forced to be the same.

As a result, you are fooling the KS test. It turns out the p-values it returns are dramatically too large, as these results of 10,000 simulated datasets (of size $50$) attest. They summarize two p-values: one obtained by applying the KS test to an iid standard Normal sample and another obtained in exactly the same way, after standardizing that sample.

Figure

The red lines plot the ideal null (uniform) distribution for reference.

One thought would be to correct the standardized p-value somehow. But sometimes the p-values are nearly the same because the original sample happened to be nearly standardized, anyway. On rare occasions the standardization makes the data look less like they were drawn from a standard Normal distribution: the KS test evaluates many other aspects of the distribution than its first two moments. But most often, standardization pulls the p-value up (making it harder to detect a departure from being standard Normal). Consequently, we cannot even predict the correct p-value from the incorrect one with acceptable accuracy. Here is the scatterplot of the pairs of p-values in the simulation.

Figure 2

These considerations are sufficiently general--they appeal to no particular property of the KS test apart from its purpose--and thereby suggest similar problems would attend the use of standardization with almost any distributional test.


Such simulations take little time (this requires less than a second to complete) and can be coded in minutes, so they often are worth doing when subtle questions of this kind arise. As an example of how little effort might be needed, here's R code to reproduce this simulation.

n.sim <- 1e4
n <- 50
set.seed(17)
X <- matrix(rnorm(n*n.sim), n)

f <- function(x) ks.test(x, "pnorm")$p.value
ks.1 <- apply(X, 2, f)
ks.2 <- apply(scale(X), 2, f)

The rest of it is a matter of post-processing the arrays of p-values in ks.1 and ks.2. For the record, here's how I did that to make the figures.

# Figure 1: Histograms
par(mfrow=c(1,2))
b <- seq(0, 1, by=0.05)
hist(ks.1, breaks=b, freq=FALSE, col=gray(.9), main="Non-standardized", xlab="p-value")
abline(h=1, lwd=2, col=hsv(0,1,3/4))
hist(ks.2, breaks=b, freq=FALSE, col=gray(.9), main="Standardized", xlab="p-value")
abline(h=1, lwd=2, col=hsv(0,1,3/4))
par(mfrow=c(1,1))

# Figure 2: Scatterplot
plot(ks.1, ks.2, pch=21, bg=gray(0, alpha=.05), col=gray(0, alpha=.2), cex=.5,
     xlab="Non-standardized p-value", ylab="Standardized p-value", asp=1)