[Math] Determining the value of ECDF at a point using Matlab

MATLABstatistics

I have a data $X=[x_1,\dots,x_n].$

In Matlab, I know by using

[f,x]=ecdf(X)
plot(x,f)

we will have the empirical distribution function based on $X$.

Now, if $x$ is given, how will I know the value of my ECDF at this point?

Best Answer

You can use interpolation for this. In Matlab, interp1 (documentation) performs a variety of interpolation methods on 1-D data. In your case, you might try nearest neighbor or possibly linear interpolation, though you could attempt higher order schemes depending on your data. Nearest neighbor interpolation returns the point from your data $X$ that is closest to a supplied query point $x$ – here's an example:

rng(1);           % Sent random seed to make repeatable
Y = randn(1,100); % Normally distributed random data
[F,X] = ecdf(Y);  % Empirical CDF
stairs(X,F);      % Use stairstep plot to see actual shape
hold on;
X = X(2:end);     % Sample points, ECDF duplicates initial point, delete it
F = F(2:end);     % Sample values, ECDF duplicates initial point, delete it
x = [-1 0 1.5];   % Query points
y = interp1(X,F,x,'nearest'); % Nearest neighbor interpolation
plot(x,y,'ko');   % Plot interpolated points on ECDF

This produces a figure like this: enter image description here

Note that in the code above I had to remove the first point from the values returned by ecdf. This is because interp1 requires that the sample points (here X) be strictly monotonically increasing or decreasing.

Related Question