Shapiro-Wilk Test – Why Compare It with Kolmogorov-Smirnov Test for Same Sample Size

distributionskolmogorov-smirnov testnormality-assumptionreferencesshapiro-wilk-test

I am trying to understand when to use Shapiro-Wilk Test and/or Kolmogorov-Smirnov Test for normality. Currently, I am following this website How to Test for Normality in R (4 Methods). This paper Descriptive Statistics and Normality Tests for Statistical Data says:

The Shapiro–Wilk test is more appropriate method for small sample
sizes (<50 samples) although it can also be handling on larger sample
size while Kolmogorov–Smirnov test is used for n ≥50. For both of the
above tests, null hypothesis states that data are taken from normal
distributed population.

So it suggests using the Kolmogorov-Smirnov test for a sample size bigger than 50 is more appropriate. While the website I mentioned above, uses the same sample size for both tests to check normality. I don't understand why

Also from this What is the difference between the Shapiro–Wilk test of normality and the Kolmogorov–Smirnov test of normality? and answer, it seems that you can't actually compare the tests.

So I was wondering why the website is comparing the same sample size for normality and when is it not possible to use Shapiro-Wilk test anymore?

Best Answer

The main difference between the S-W and the K-S test for normality is that the S-W can be used to assess the goodness of fit to a fitted distribution, whereas the K-S test is only valid to test against a prespecified distribution.

So if you estimate the parameters of your distribution from your data, then test the goodness of fit of your data against the distribution with the estimated mean and variance, you can't use the K-S any more - it will have a too optimistic view of the goodness of fit. In this situation, use the S-W test. (Normalizing your data with a mean and variance estimated from the data is the same thing in this context.) If, conversely, you know the precise distribution your data should be coming from (e.g., an N(5,1) distribution), you can use K-S.

Thus, the rationale for using one test over another does not hang on the sample size, at least as long as the distinction above is not kept in mind. Once this is kept in mind, you can start looking at power against specific alternatives.

There is a lot of statistical misinformation floating around the internet.

(Incidentally, there is much less necessity for testing normality than is commonly assumed, see the replies to this chat message:.

How to cope with non-normal ANOVA residuals?

Assumptions of linear models and what to do if the residuals are not normally distributed

Analysis of variance with not normally distributed residuals : how important is normality? )


Edit: John Madden asks a very interesting question:

Wouldn't large sample sizes allow for hand-waving away the fact that parameters had to be estimated by standard arguments (i.e. continuous mapping theorem?), and perhaps this is where the idea that KS is more appropriate for large samples comes from?

Let's take a look. We will simulate $x_1, \dots, x_n\sim N(0,1)$, for increasing sample sizes $n$. For this vector $x$, we perform a Kolmogorov-Smirnov test against a normal distribution with mean and standard deviation equal to the mean and standard deviation of $x$, and store the $p$ value. For each $n$, we do this simulation exercise 10,000 times and plot the $p$ values in a histogram. If the K-S test were valid, this histogram would be uniform, and if John's question had a positive answer, the histograms would become more uniform as $n$ increases. So we run the exercise for $n\in\{10,100,1000,10000\}$. The histograms don't get any more uniform, so no, the K-S-test against estimated parameters does not get better for large sample sizes:

histograms

R code:

exponents <- 1:4
par(mfrow=c(2,2),mai=c(.5,.1,.5,.1))
for ( ee in exponents ) {
    p_values <- rep(NA,1e4)
    for ( ii in seq_along(p_values) ) {
        sims <- rnorm(10^ee)
        p_values[ii] <- ks.test(sims,pnorm,mean=mean(sims),sd=sd(sims))$p.value
    }
    hist(p_values,xlab="",yaxt="n",
        main=paste("Histogram of p values\nSample size:",10^ee))
}

(It turns out that whuber already did this analysis almost a year ago.)

As various commenters point out completely correctly, there are different ways of addressing this property of the K-S test, like the Lilliefors modification, or bootstrapping the test statistic obtained with estimated parameters.