Solved – Value at $D_\max$ from Kolmogorov-Smirnov test in R

distributionskolmogorov-smirnov testr

I am comparing two distributions using the Kolmogorov-Smirnov test [ks.test()] in R and would like to know what the numerical value is where $D_\text{max}$ occurs. It's not in the output of ks.test() as far as I can see, so I was wondering if anyone had any ideas about how I could find that out. Thanks!

Best Answer

Something like that? Dmax occurs at the value max.at.

set.seed(12345)

x <- rnorm(10000, 5, 5)
y <- rnorm(10000, 7, 6.5)

# remove any missings from the data

x <- x[!is.na(x)]
y <- y[!is.na(y)]

ecdf.x <- ecdf(x)
ecdf.y <- ecdf(y)

plot(ecdf.x, xlim=c(min(c(x,y)), max(c(x,y))), verticals=T, cex.lab=1.2, cex.axis=1.3,
     las=1, col="skyblue4", lwd=2, main="")

plot(ecdf.y, verticals=T, add=T, do.points=FALSE, cex.lab=1.2,
     cex.axis=1.3, col="red", lwd=2)

n.x <- length(x)
n.y <- length(y)

n <- n.x * n.y/(n.x + n.y)
w <- c(x, y)

z <- cumsum(ifelse(order(w) <= n.x, 1/n.x, -1/n.y))

max(abs(z)) # Dmax
[1] 0.1664

ks.test(x,y)$statistic # the same
     D 
0.1664

max.at <- sort(w)[which(abs(z) == max(abs(z)))]
[1] 9.082877

# Draw vertical line

abline(v=max.at, lty=2)

lines(abs(z)~sort(w), col="purple", lwd=2)

legend("topleft", legend=c("x", "y", "|Distance|"), col=c("skyblue4", "red", "purple"), lwd=c(2,2,2), bty="n")

Dmax visualization