[Math] Understanding the definition of the empirical measure

measure-theoryprobability theory

I'm reading Kosorok's Introduction to Empirical Processes and Semiparametric Inference and I'm stuck on an important definition.

We define the empirical measure to be $P_n=n^{-1}\sum_{i=1}^n\delta_{X_i}$, where $\delta_x$ is the measure that assigns mass 1 at $x$ and zero elsewhere. For a measurable function $f : X \rightarrow R$, we denote $P_nf=n^{-1}\sum_{i=1}^nf(X_i)$.

First, let me take a crack at $P_n$: is it the same as $n^{-1}\delta_{U_{i}X_i}$ where I use $U_{i}X_i$ to denote the union of the points $X_i$ over $i$? Does $P_n$ inherently only work on functions?

And then, does $P_nf$ follow directly from the definition of $P_n$? I'm confused by the wording "we denote". Is this a new definition?

Best Answer

For each $\omega$, $P_n(\omega)=\frac1n \sum\limits_{i=1}^n \delta_{X_i(\omega)}$ is a measure on $X$. More specifically, $$ P_n(\omega)(A)=\frac1n \sum_{i=1}^n\delta_{X_i(\omega)}(A)=\frac{\#\{1\leq i\leq n\mid X_i(\omega)\in A\}}{n},\quad A\subseteq X. $$ If $f:X\to\mathbb{R}$ is measurable, then $P_n f:\Omega\to\mathbb{R}$ is just integration of $f$ with respect to $P_n$ defined $\omega$-by-$\omega$. In other words, for fixed $\omega$, $$ P_n f(\omega):=\int_X f\,\mathrm dP_n(\omega)=\frac1n\sum_{i=1}^n\int_X f\,\mathrm d\delta_{X_i(\omega)}=\frac 1n\sum_{i=1}^n f(X_i(\omega)). $$