Solved – How to compare observed distribution to sampled distributions, using cumulative distibution function

cumulative distribution functionrsample

I have a distribution of observed measurements and I want to compare it to sampled distributions, using R.
I have a program that samples distributions, according to a certain low of probability / constraint.

Suppose I have:

  • $m$ observed measurements
  • two sets of $n$ samples of $m$ measurements

I'd like to compute:

  • the average cumulative distribution function (a-cdf) of the first set of samples (its value at each evaluation points is the mean value of the cdf of each sample at this evaluation point, and the evaluation points are the pooled $n\cdot m$ measurements)
  • the 95% envelope of the first set of samples
  • the spatial distribution index (sdi) of my observed measurements using the second set of $n$-samples. To compute the sdi, we compute all the maximum differences between the cdf of each new sample and the a-cdf, and the maximum difference between the observed cdf and a-cdf. That makes $n+1$ points, and the sdi is the rank of the observed difference among the $n+1$ differences.

I'd like to know if thoses function exists in R, and if not, what would be the smartest way to implement them.
Hope i'm clear enough.

Best Answer

You can use the ecdf, quantile and rank functions to do this.

# Sample data
n <- 10
m <- 50
observed <- rnorm(m)
set1 <- matrix(rnorm(n*m),nc=n)
set2 <- matrix(rnorm(n*m),nc=n)

# Average cumulated distribution function of the first set of samples
ecdfs <- apply(set1, 2, ecdf)
acdf <- function(u) mean(sapply(ecdfs, function(f) f(u)))
acdf <- Vectorize(acdf)

# 95% envelope
alpha <- .025
x <- unique(sort(set1))
y <- sapply(ecdfs, function(f) f(x))
y <- apply(y, 1, quantile, c(alpha, 1-alpha))
lower <- approxfun( x, y[1,], method="constant", rule=2 )
upper <- approxfun( x, y[2,], method="constant", rule=2 )
curve(upper(x), xlim=c(-3,3))
curve(lower(x), add=TRUE)
curve(acdf(x), add=TRUE, lty=3)

# Rank of the observed sample 
# among the maximum differences
difference <- function(u) {
  max( abs( ecdf(u)(c(x,u)) - acdf(c(x,u)) ) )
}
sdi <- apply(set2, 2, difference)
sdi <- c( difference(observed), sdi )
sdi <- rank(sdi)[1]

# If you were to write a function to do this,
# it would return the following
list(
  acdf  = acdf,
  upper = upper,
  lower = lower, 
  sdi = sdi
)
Related Question