KS Test Using R and Minitab – How to Perform and Interpret Results

minitabr

I'm trying to figure out how Kolmogorov-Smirnov one-sample testing for normality is done in Minitab (or Systat, since the answers apparently match).

If this is my data vector:

abc <- c(0.0313, 0.0273, 0.0379, 0.0427, 0.0286, 0.0327, 0.0298, 0.0381, 0.0559, 0.0573,
0.0558, 0.113, 0.0464, 0.0442, 0.0579, 0.0495)

The boneheaded way of doing this in R would be:

ks.test(abc, pnorm, mean(abc), sd(abc))

Yes, I know that the ks.test help page says to not use the data to estimate the mean/sd of the comparison distribution. Hence, boneheaded. Sidenote – if I understand correctly, SAS is using this as a regular procedure? http://support.sas.com/documentation/cdl/en/procstat/63104/HTML/default/viewer.htm#procstat_univariate_sect037.htm

Anyway, the p-value R gives for this improper test is 0.3027, while apparently both Minitab and Systat provide a p-value of 0.029.

The project manager won't hear anything about using other means of testing for normality (or, heavens forbid, use plots of data distribution). At this point I'm just trying to figure out what it is that the other softwares are doing, so that I can explain to myself the differences…

Am I missing something?? If people suggest using simulations instead of the direct test, like here (http://r.789695.n4.nabble.com/Kolmogorov-Smirnov-Test-td3037232.html), would it be possible to include detailed code?

Thank you!

Best Answer

Here is some R code to do a simulation generating data from a normal with the same mean and sd, then doing the KS test using the sample (not the generating) statistics:

out <- replicate(100000, {x <- rnorm( length(abc), mean(abc), sd(abc) );
    ks.test(x, pnorm, mean(x), sd(x))$p.value } )

hist(out)

mean(out <= ks.test(abc, pnorm, mean(abc), sd(abc))$p.value)

My estimated p-value from the simulation is 0.021 (can get more accuracy/precision by running it for more simulations) which is more similar to the minitab/systat values (but not exactly. So this suggests that the other programs may be adjusting in some way for the estimated parameter values. But there is still enough difference that I expect the adjustment is different from the simulation procedure.

Related Question