Why is the empirical measure $L_n=\frac1 n \sum_{k=1}^n\delta_{X_k}$ a random variable with values in $\mathscr M_1(\{0,1\})$

measure-theoryprobability theory

According to the following example,

enter image description here

the empirical measure $L_n=\frac1 n\sum_{k=1}^n\delta_{X_k}$ is defined as a random variable with values in $\mathscr M_1(\{0,1\})$

1) How can that be? Why is it a random variable and at the same time a measure? And most importantly how can the image be $\mathscr M_1(\{0,1\})$? As far as I know a measure is a real-valued function. But in the example they say it takes values in $\mathscr M_1({0,1})$.

My book defines:

enter image description here

so $\mathscr M_1(\{0,1\})$ should be the space of Borel-measures over $S=\{0,1\}$.And the Borel sigma algebra should be just $\mathscr B(S)=\{ \{0 \},\{ 1\},\{0,1 \},\emptyset\}$

2)Additionally to get the expressions for $L_n$ and $\nu_n$ in the Bernoulli case, I tried doing this:

For $\nu_n$ :

Let $A \subseteq \mathscr Y$ be a Borel set

By definition of distribution of a random variable

$\nu_n$ is the distribution of $L_n$ so $\nu_n(A)=\mathbb P(L_n\in A)$

$\mu_n$ is the distribution of $\frac{S_n}{n}$ so $\mu_n(A)=\mathbb P(\frac{S_n}{n}\in A)$

$\mu\circ f^{-1}(A)=\mu( f^{-1}(A))=\mathbb P(\frac{S_n}{n}\in f^{-1}(A))$

$=\mathbb P(f(\frac{S_n}{n})\in A)=\mathbb P(\frac{S_n}{n}\delta_1+(1-\frac{S_n}{n})\delta_0\in A)=$
$\mathbb P(L_n \in A)=\nu_n(A)$

But I am not sure if the following step was correct:

$\mathbb P(\frac{S_n}{n}\in f^{-1}(A))=\mathbb P(f(\frac{S_n}{n})\in A)$

because if I remember correctly only the following is true $f(f^{-1})(A)\subseteq A$ for a generic f.

If it was correct, why? and is there a name for the property in such step? If not, how do I prove it then? And how do I get the expression for $L_n$?

Best Answer

1) Recall that an $\mathcal{S}$-valued random variable, where $\mathcal{S}$ is a measurable space, is simply a measurable function from the ambient probability space $(\Omega, \mathcal{F}, \mathbf{P})$ to $\mathcal{S}$.

In OP's case, $L_n$ takes two inputs, a sample $\omega \in \Omega$ and a Borel subset $B$ of $\{0,1\}$, and gives the value

$$ L_n(\omega, B) = \frac{S_n(\omega)}{n} \delta_1(B) + \left(1 - \frac{S_n(\omega)}{n}\right) \delta_0(B). $$

By partially evaluating $L_n$ at each given $\omega$ but leaving $B$ free, we can regard $L_n$ as a function

\begin{align*} L_n(\omega) &= \left[ B \mapsto \frac{S_n(\omega)}{n} \delta_1(B) + \left(1 - \frac{S_n(\omega)}{n}\right) \delta_0(B) \right] \\ &= \frac{S_n(\omega)}{n} \delta_1 + \left(1 - \frac{S_n(\omega)}{n}\right) \delta_0. \end{align*}

Each of these values is a probability measure on $\{0, 1\}$, and so, $L_n$ is an $\mathscr{M}_1(\{0,1\})$-valued random variable, provided it is measurable. This requires us to endow $\mathscr{M}_1(\{0,1\})$ with a $\sigma$-algebra so that it becomes a measurable space. This $\sigma$-algebra is usually chosen as the smallest one for which the map

$$ I_B : \mathscr{M}_1(\{0,1\}) \to \mathbb{R}, \qquad I_B(\mu) = \mu(B) $$

is measurable for each Borel subset $B$ of $\{0, 1\}$. An immediate consequence of this definition is that the followings are equivalent for a function $Y : \Omega \to \mathscr{M}_1(\{0,1\})$:

  • $Y$ is measurable (hence an $\mathscr{M}_1(\{0,1\})$-valued random variable);
  • $ I_B \circ Y : \Omega \to \mathbb{R} $ is measurable (hence a real-valued random variable) for each Borel subset $B$ of $\{ 0, 1\}$.

However, in OP's case,

$$ I_B \circ L_n = \frac{S_n}{n} \mathbf{1}_{\{1 \in B\}} + \left(1 - \frac{S_n}{n}\right) \mathbf{1}_{\{0 \in B\}} $$

is a real-valued random variable since $S_n$ is. Therefore $L_n$ is also an ($\mathscr{M}_1(\{0,1\})$-valued) random variable.

2) The logical relation $$ x\in f^{-1}(A) \qquad\iff\qquad f(x)\in A $$ holds for any function $f:X\to Y$, $x \in X$, and $A \subseteq Y$. This is because the right-hand side is the very defining property for the set $f^{-1}(A)$: $$f^{-1}(A) := \{x \in X : f(x) \in A\}$$

As for determining the law of $L_n$, note that $L_n$ can only take any of the values

$$ \text{Ber}(\tfrac{k}{n}) = \tfrac{k}{n}\delta_1 + \tfrac{n-k}{n}\delta_0, \qquad k=0,1,\ldots,n. $$

Moreover, for each $k = 0, 1, \ldots, n$, the probability of the event $\{L_n = \text{Ber}(\tfrac{k}{n})\}$ is computed by

$$ \nu_n(\{\text{Ber}(\tfrac{k}{n})\}) = \mathbf{P}(L_n = \text{Ber}(\tfrac{k}{n})) = \mathbf{P}(S_n=k) = \binom{n}{k}p^k(1-p)^{n-k}.$$

This tells that $\nu_n$ is a discrete measure given by

$$ \nu_n = \sum_{k=0}^{n} \mathbf{P}(L_n = \text{Ber}(\tfrac{k}{n})) \delta_{\text{Ber}(k/n)} = \sum_{k=0}^{n} \binom{n}{k} p^k (1-p)^{n-k} \delta_{\text{Ber}(k/n)}. $$

Related Question