I'm learning about the Empirical Cumulative Distribution Function. But I still don't understand
-
Why is it called 'Empirical'?
-
Is there any difference between Empirical CDF and CDF?
cumulative distribution functiondistributionsempirical-cumulative-distr-fnterminology
I'm learning about the Empirical Cumulative Distribution Function. But I still don't understand
Why is it called 'Empirical'?
Is there any difference between Empirical CDF and CDF?
Best Answer
Let $X$ be a random variable.
The distinction is which probability measure is used. For the empirical CDF, you use the probability measure defined by the frequency counts in an empirical sample.
Simple example (coin flip):
Let $X$ be a random variable denoting the result of a single coin flip where $X=1$ denotes heads and $X=0$ denotes tails.
The CDF for a fair coin is given by: $$ F(x) = \left\{ \begin{array}{ll} 0 & \text{for } x < 0\\ \frac{1}{2} & \text{for } 0 \leq x < 1 \\1 & \text{for } 1 \leq x \end{array} \right. $$
If you flipped 2 heads and 1 tail, the empirical CDF would be: $$ G(x) = \left\{ \begin{array}{ll} 0 & \text{for } x < 0\\ \frac{2}{3} & \text{for } 0 \leq x < 1 \\1 & \text{for } 1 \leq x \end{array} \right. $$
The empirical CDF would reflect that in your sample, $2/3$ of your flips were heads.
Another example ($F$ is CDF for normal distribution):
Let $X$ be a normally distributed random variable with mean $0$ and standard deviation $1$.
The CDF is given by:
$$F(x) = \int_{-\infty}^x \frac{1}{\sqrt{2\pi}} e^{\frac{-x^2}{2}}$$
Let's say you had 3 IID draws and obtained the values $x_1 < x_2 < x_3$. The empirical CDF would be: $$ G(y) = \left\{ \begin{array}{ll} 0 & \text{for } y < x_1\\ \frac{1}{3} & \text{for } x_1 \leq y < x_2 \\\frac{2}{3} & \text{for } x_2 \leq y < x_3 \\1 & \text{for } x_3 \leq y \end{array} \right. $$
With enough IID draws (and certain regularity conditions are satisfied), the empirical CDF would converge on the underlying CDF of the population.