Solved – Interpolating the empirical cumulative function

estimationinterpolationrandom variable

The empirical cumulative distribution function of a random variable, given observations $x_\left( k \right) > x_\left( k-1 \right)$, $k \in \mathbb N$, $k \le n$, is defined as $F_{emp}(x_\left( k \right) > X \ge x_\left( k-1 \right)) = \frac k {n+1}$ and $F_{emp}(X \ge x_\left(n\right))=1$.

Why? As long as we're interpolating, wouldn't it make sense to use some interpolation method with less error? A simple nearest neighbour or piecewise average interpolant would be an improvement, and a cubic interpolant would get us a differentiable empirical density function, too.

The above definition won't even give you the piecewise infimum of the cdf, because the variable is random. It certainly approaches the true function as $n\to\infty$, but then so would any other interpolant. Surely at least linear interpolants were considered.

Best Answer

The EDF is the CDF of the population constituted by the data themselves. This is exactly what you need to describe and analyze any resampling process from the dataset, including nonparametric bootstrapping, jackknifing, cross-validation, etc. Not only that, it's perfectly general: any kind of interpolation would be invalid for discrete distributions.

Related Question