Law of the unconscious statistician proof

measure-theoryprobability theory

I want to show a well known result from probability theory which is the following :
We consider a probability space $(\Omega,\mathcal{F},\mathbb{P})$ and $X$ is a r.v with values in $(E,\mathcal{E})$ and we have a function $f$ from $(E,\mathcal{E})$ to $(\mathbb{R},B(\mathbb{R}))$. Then we have $\mathbb{E}[f(X)]=\int_{E}f(x)\,d\mathbb{P}_{X}(x)$.

What I proposed using what I have from measure theory : we start with the indicator function $f(x)=\mathbb{1}_A(x)$ where $A\in\mathcal{E}$. So we have
$\mathbb{E}[f(X)]=\mathbb{E}[\mathbb{1}_A(X)]=\mathbb{P}(X\in A)=\mathbb{P}_{X}(A)=\int_{A}\,d\mathbb{P}_{X}(x)=\int_{E}\mathbb{1}_A(x)\,d\mathbb{P}_{X}(x)=\int_{E}f(x)\,d\mathbb{P}_{X}(x)$.

But I have a problem here with this since $\mathbb{P}_{X}(A)=\mathbb{P}(X^{-1}(A))=\mathbb{P}(X\in A)$ so I don't understand why we write $d\mathbb{P}_{X}(x)$ and not $d\mathbb{P}_{X}(A)$.

If we consider $\mathbb{P}_{X}(x)=\mathbb{P}(X^{-1}(x))$ there is a problem since we are not sure that X defines a bijection no ? So what does really mean this $d\mathbb{P}_{X}(x)$ is not that straightforward for me

I know that I'm wrong but I don't know where and what put me in trouble so thank you for your help !

EDIT : I don't know if it is the right place to put it but it seems that the answer section was not adapted. Thank you.

Okay thank you all for your help. So it I understand well the comment of @leander Tilsted Kristensen, I have to be careful on the way I interpret the $d\mathbb{P}_{X}(x)$. Indeed, I was think at him as the $dx$ from the Riemann Integral but the only thing in commun is that it is the measure we use on the set of integration, so I was wondering, how this $d\mathbb{P}_{X}(x)$ should be understand ?

Now concerning the end of the proof, we define $f(x)=\sum_{i=1}^{n}f_i\mathbb{1}_{\mathcal{E}_i}(x)$. It follows
$\mathbb{E}[f(X)]=\mathbb{E}[\sum_{i=1}^{n}f_i\mathbb{1}_{\mathcal{E}_i}(X)]=\sum_{i=1}^{n}f_i\mathbb{E}[\mathbb{1}_{\mathcal{E}_i}(X)]=\sum_{i=1}^{n}f_i\int_{\mathcal{E}_i}x\,d\mathbb{P}_{X}(x)=\int_{E}\sum_{i=1}^{n}f_i\mathbb{1}_{\mathcal{E}_i}(x)\,d\mathbb{P}_{X}(x)=\int_{E}f(x)\,d\mathbb{P}_{X}(x)$ where the $\mathcal{E}_i\in\mathcal{E}$

I have choosen $x$ as an input for the function $f$ since $X : \Omega\to E$ such that for each $\omega\in \Omega $ we have $\omega\to X(\omega)=x$

I know why I'm using the pushforward measure (since we work on $(E,\mathcal{E})$ but the use of the $x$ still unclear. To extend the proof for positive measurable function I need to use the monotone convergence theorem but I prefer to don't go in details since as it has been said I'm not confident with my understanding of the notation so I prefer to wait for your comments.

Thank you a lot

Please be understanding and feel free to correct me

Best Answer

$P_X$ is the probability measure on $\mathbb{R}$ defined by $P_X(A) = P(X \in A)$. The LOTUS says that $E(f(X)) = \int_{\mathbb{R}}f \, dP_X$ for all measurable $f \geq 0$. You've already proved it for indicators $f$. Linearity extends it to simple $f$ and mct extends it to arbitrary nonnegative $f$.

Related Question