[Math] Difference between conditional expectation and conditional probabilty

conditional-expectationmarkov chainsprobabilityprobability theory

These are known definitions:
We have a probability space $(\Omega, A, P)$

Conditional probability is defined through $P(A|B) = \frac{P(A \cap B)}{P(B)}, P(B) > 0$. This is a real nunmber.

Then also where is conditional expectation $E[X|A_0]$ with $A_0$ being some subalgebra. This is a random variable. In the special case when $A_0$ is generated by a random variable $Y$ we also write $E[X|\sigma(Y)] = E[X|Y]$ and using the factorization theorem you can also write $E[X|Y]$ as a RV of $Y$ which you then denote as $E[X|Y=y]$. For $X$ being in indicator function we sometimes also write $E[1_A | Y=y] =: P(A | Y = y)$ which confuses me very much, because this now is the evaluation of an only a.s. defined RV

I am particular confused with the proof I state in this Question.

Since the probability of $\{X_1 = i_1, … X_{n-1} = i_{n-1}\}$ is 0 because the $X_i$ are real valued, the proof can't be talking about conditional probabilities. Thus I think the proof uses the sloppy notation for conditional expectation $E[1_B(X_n) | X_1 = i_1, … X_{n-1} = i_{n-1}] =: P(X_n \in B | X_1 = i_1, … X_{n-1} = i_{n-1})$. Now in this concept, I dont understand why the single steps in the proof are true. They are all obvious true for $P(|)$ being conditional probability, but in my eye not trivial for $P(|)$ being conditional expectations

Best Answer

Let $X, Y$ be random variable, then the condition expectation is given by $$ E(Y|X=x)=\int_{Y}yf_{Y|X=x}(x,y)dy=\int_{Y}y\frac{f_{X,Y}(x,y)}{f_{X}(x)}dy=\int_{y\in Y}y\frac{f_{X,Y}(x,y)}{\int_{t\in Y}f_{X,Y}(x,t)dt}dy $$ where $x$ is any element in $X$. Therefore $E(Y|X=x)$ is function of $x$. Here the inner term $f_{Y|X}(x,y)$ is precisely coming from conditional distribution $P(A|B)=\frac{P(A\cap B)}{P(B)}$. One way to think about it is to consider the discrete case, then we have $P(Y=y|X=x)$ given by the quotient $\frac{P(Y=y, X=x)}{P(X=x)}$. And $P(X=x)=\sum_{y\in Y}P(X=x, Y=y)$.

For what you asked on the second part, I think it is important to note that for Markov chains, the probability transition function from one state to the next state is independent of the previous states. You can think about this as tossing a fat coin on the real line. Suppose the coin goes to position $x$ after $n$-steps, whether the coin goes to $x\pm 1$ in the next step would be totally independent of its past. In other words, given the coin has undergone $HTHTTT\cdots $sequence does not change the expectation we have given the $n$-th step it is at position $x$ at all. The coin does not remember that. To me this is what makes the Markov chain so nice and useful.

Related Question