Kolmogorov-Smirnov Test – How to Use Kolmogorov-Smirnov Test for Assessing Normality of a Random Variable

goodness of fitkolmogorov-smirnov testnormal distributionnormality-assumptionr

Context

I am confused by the following post where the accepted answer states that :

You can't really even compare the two since the Kolmogorov-Smirnov is
for a completely specified distribution (so if you're testing
normality, you must specify the mean and variance; they can't be
estimated from the data*), while the Shapiro-Wilk is for normality,
with unspecified mean and variance.

you also can't standardize by using estimated parameters and test for standard normal; that's actually the same thing.

Question

Imagine that I have a random sample of measurements X which I standardise using its sample mean and variance. May I use the Kolmogorov Smirnov test as a GOF test to assess normality of this random sample ?

$$ H_0 : X_\text{scaled} \sim N(0,1) $$

Illustration

To illustrate my question here is a code snippet in R :

# We wish to do a Goodness of Fit test that X is a random sample from a Normal Distribution N(mu,sigma^2)
X <- c(10.212, 10.103, 10.242, 10.106, 10.102, 10.095, 10.042, 10.093, 10.302, 10.111)
sample.mean <- mean(X)
sample.variance <- var(X)
# Or that standardized X (scaled.X) is a random sample from a standard normal distribution N(0,1)
scaled.X <- (X-sample.mean)/(sqrt(sample.variance))


# Kolmogorov-Smirnov Test H0 : X ~ N(0,1)

ks.test(scaled.X,alternative="two.sided",y = "pnorm")
# Do not reject the null.
# Shapiro Test 
shapiro.test(scaled.X)
# Do reject the null.

Note that the KS test and the Shapiro-Wilk test gives contradictory results, hinting towards the Shapiro Wilk test being more powerful in this specific case. This is however not my main question although any comments on this is gladly welcomed.

The specific area of interest of this question is if using the KS test on a standardized random sample (with sample statistics) a sound way to evaluate the normality assumption.

Best Answer

Your approach is Procrustean: when you standardized the data, you forced them to look a little more like standard Normal values than they had. After all, part of detecting a difference in distribution involves comparing their means and variances, which you have forced to be the same.

As a result, you are fooling the KS test. It turns out the p-values it returns are dramatically too large, as these results of 10,000 simulated datasets (of size $50$) attest. They summarize two p-values: one obtained by applying the KS test to an iid standard Normal sample and another obtained in exactly the same way, after standardizing that sample.

The red lines plot the ideal null (uniform) distribution for reference.

One thought would be to correct the standardized p-value somehow. But sometimes the p-values are nearly the same because the original sample happened to be nearly standardized, anyway. On rare occasions the standardization makes the data look less like they were drawn from a standard Normal distribution: the KS test evaluates many other aspects of the distribution than its first two moments. But most often, standardization pulls the p-value up (making it harder to detect a departure from being standard Normal). Consequently, we cannot even predict the correct p-value from the incorrect one with acceptable accuracy. Here is the scatterplot of the pairs of p-values in the simulation.

These considerations are sufficiently general--they appeal to no particular property of the KS test apart from its purpose--and thereby suggest similar problems would attend the use of standardization with almost any distributional test.

Such simulations take little time (this requires less than a second to complete) and can be coded in minutes, so they often are worth doing when subtle questions of this kind arise. As an example of how little effort might be needed, here's R code to reproduce this simulation.

n.sim <- 1e4
n <- 50
set.seed(17)
X <- matrix(rnorm(n*n.sim), n)

f <- function(x) ks.test(x, "pnorm")$p.value
ks.1 <- apply(X, 2, f)
ks.2 <- apply(scale(X), 2, f)

The rest of it is a matter of post-processing the arrays of p-values in ks.1 and ks.2. For the record, here's how I did that to make the figures.

# Figure 1: Histograms
par(mfrow=c(1,2))
b <- seq(0, 1, by=0.05)
hist(ks.1, breaks=b, freq=FALSE, col=gray(.9), main="Non-standardized", xlab="p-value")
abline(h=1, lwd=2, col=hsv(0,1,3/4))
hist(ks.2, breaks=b, freq=FALSE, col=gray(.9), main="Standardized", xlab="p-value")
abline(h=1, lwd=2, col=hsv(0,1,3/4))
par(mfrow=c(1,1))

# Figure 2: Scatterplot
plot(ks.1, ks.2, pch=21, bg=gray(0, alpha=.05), col=gray(0, alpha=.2), cex=.5,
     xlab="Non-standardized p-value", ylab="Standardized p-value", asp=1)

Related Solutions

Statistical Tests – Differences Between Shapiro-Wilk Test and Kolmogorov-Smirnov Test

You can't really even compare the two since the Kolmogorov-Smirnov is for a completely specified distribution (so if you're testing normality, you must specify the mean and variance; they can't be estimated from the data*), while the Shapiro-Wilk is for normality, with unspecified mean and variance.

* you also can't standardize by using estimated parameters and test for standard normal; that's actually the same thing.

One way to compare would be to supplement the Shapiro-Wilk with a test for specified mean and variance in a normal (combining the tests in some manner), or by having the KS tables adjusted for the parameter estimation (but then it's no longer distribution-free).

There is such a test (equivalent to the Kolmogorov-Smirnov with estimated parameters) - the Lilliefors test; the normality-test version could be validly compared to the Shapiro-Wilk (and will generally have lower power). More competitive is the Anderson-Darling test (which must also be adjusted for parameter estimation for a comparison to be valid).

As for what they test - the KS test (and the Lilliefors) looks at the largest difference between the empirical CDF and the specified distribution, while the Shapiro Wilk effectively compares two estimates of variance; the closely related Shapiro-Francia can be regarded as a monotonic function of the squared correlation in a Q-Q plot; if I recall correctly, the Shapiro-Wilk also takes into account covariances between the order statistics.

Edited to add: While the Shapiro-Wilk nearly always beats the Lilliefors test on alternatives of interest, an example where it doesn't is the $t_{30}$ in medium-large samples ($n>60$-ish). There the Lilliefors has higher power.

[It should be kept in mind that there are many more tests for normality that are available than these.]

Kolmogorov-Smirnov Test – How to Perform a Kolmogorov-Smirnov Two-Sample Test

I am assuming you are asking because the Suanshu help page reports in reference to the K-S distribution, "This is not done yet." Luckily, it is very easy to do in R. If x and y are your two samples, ks.test(x,y) returns the test statistic and pvalue. For example,

> x <- rnorm(50)
> y <- runif(30)
> ks.test(x, y)    
        Two-sample Kolmogorov-Smirnov test    
data:  x and y 
D = 0.5, p-value = 9.065e-05
alternative hypothesis: two-sided

By default, it will compute exact or asymptotic p-values based on the product of the sample sizes (exact p-values for n.x*n.y < 10000 in the two-sample case), or you can specify this option with a third argument, exact=F or exact=T. Exact p-values are calculated using the methods of Marsaglia, et al. (2003), which the Suanshu documentation also cites. Some large sample approximations are given here, although I don't have a proper citation. Lastly, if you don't want to install R, there are web calculators for the two-sample K-S test, although I don't know if they use the same algorithm as R because the one I found only reported three decimal points for the p-value.