Gambling is a good starting-point for probability. We can treat $\sigma$-field as a structure of events as we need to define the addition and multiplication for numbers. The completeness of the real numbers is suitable for our calculations, and $\sigma$-field plays the same role.
I hope the following gambling example helps you to understand the filtration and conditional expectation.
Assuming that two people, say player A and player B, bet on the results of two coin tosses.
H: head T: tail
At the time $0$, A and B do not know anything about the result except that one of the events in $\Omega=\{HH,HT,TH,TT\}$ will happen. Hence the information at time $0$ that they both know is $\mathcal{F}_0=\{\emptyset,\Omega\}$.
At the time $1$, the coin had been tossed only once; and they know that the events in the $\sigma$-field $\mathcal{F}_1=\{\emptyset, \Omega, \{HH,HT\},\{TH,TT\}\}\supset \mathcal{F}_0 $ could happen.
At the time $2$, the coin had been tossed twice; and they know that the events in the $\sigma$-field $\mathcal{F}_2=\{\emptyset, \Omega,\{HH,HT\},\{TH,TT\},\{HH\},\{HT\},\{TH\},\{TT\}\}\supset \mathcal{F}_1$ could happen which means they know everything about the gambling results.
Please notice the evolution of information characterized by the filtrations $\mathcal{F}_0,\mathcal{F}_1,\mathcal{F}_2.$ With time passing, the unknown world $\Omega$ is divided more finely. It is something like water flows through pipes.
Assuming that they bet on the following results and the coin is fair.
$$X(\omega)=\left\{ \begin{array}{l}
2, \omega=HH,\mbox{means the first tossing is H, and the second tossing is H}\\
1, \omega=HT,\mbox{means the first tossing is H, and the second tossing is T}\\
1, \omega=TH,\mbox{means the first tossing is T, and the second tossing is H} \\
0, \omega=TT,\mbox{means the first tossing is T, and the second tossing is T}\\
\end{array} \right.$$
Then, we have
$$E[X|\mathcal{F}_0](\omega)=1\qquad\text{for every}\ \omega $$
$$E[X|\mathcal{F_2}](\omega)=X(\omega)\qquad\text{for every}\ \omega $$
$$E[X|\{HH,HT\}]=2P(HH|\{HH,HT\})+1P(HT|\{HH,HT\})$$
$$+1P(TH|\{HH,HT\})+0P(TT|\{HH,HT\})=\frac{3}{2}$$
$$E[X|\{TH,TT\}]=2P(HH|\{TH,TT\})+1P(HT|\{TH,TT\})$$
$$+1P(TH|\{TH,TT\})+0P(TT|\{TH,TT\})=\frac{1}{2} $$
$$E[X|\mathcal{F_1}](\omega)=\left\{ \begin{array}{l}
\frac{3}{2}, \omega\in \{HH,HT\}\\
\frac{1}{2}, \omega \in \{TH,TT\}
\end{array} \right.
$$
I hope those would be helpful.
The concept of filtration is required to give a formal definition of conditional expectation. In particular, conditional expectation is a random variable because of the sigma algebra of the conditioning variable.
The filtration is a way to encode the information contained in the history of a stochastic process. If a process is adapted then all information about the process is contained in the filtration.
Of couse, any process is trivially adapted to its natural filtration; more interesting is the case if a process $\{Y_n\}$ is adapted to the natural filtration of $X_n$. In this case,
$$Y_n = f(X_1,\ldots, X_n).$$
The natural setting for filtrations, however, is martingale theory. A martingale is a process whose expected value in the future is the same as it's value now:
$$\mathbf E[M_{n+1}|\mathcal F_n] = M_n;$$
Hence, a martingale is a process whose increment is zero mean and independent from the history.
Best Answer
The random variable $X$ is measurable with respect to $\sigma$-algebra $\mathfrak F$ if $X=\mathbb E(X\mid\mathfrak F)$.
One can understand this in a few steps:
$\mathbb E(X\mid \mathbf 1_A)$ is the case $Y=\mathbf 1_A$, and $\mathbf 1_A(\omega)$ is 1 if $\omega\in A$, and 0 otherwise. This is the random variable that returns $\mathbb E(X\mid A)$ if $\omega\in A$, and $\mathbb E(X\mid A^c)$ if $\omega\not\in A$;
$\mathbb E(X\mid \mathfrak F)$, where $\mathfrak F=\{\varnothing, \Omega, A, A^c\}$, is the same as $\mathbb E(X\mid 1_A)$;
$\mathbb E(X\mid \mathfrak F)$, where $\mathfrak F=\{\varnothing, \Omega, A, A^c, B, B^c, A\cup B, A\cup B^c,\dots\}$ ($2^{2^2}=16$ elements), is something we could call $\mathbb E(X\mid 1_A, 1_B)$ and which returns $\mathbb E(X\mid A\cap B^c)$, or $\mathbb E(X\mid A^c\cap B)$, or $\mathbb E(X\mid A\cap B)$, or $\mathbb E(X\mid A^c\cap B^c)$; it is sort of superfluous to list $A\cup B$ etc. in $\mathfrak F$, it would suffice to list a generating set, but the generating set may not be unique so it is best to list all of $\mathfrak F$;
$\mathbb E(X\mid \mathfrak F)$, where $\mathfrak F=\mathfrak F_t$ is a $\sigma$-algebra corresponding to what's known at time $t$, is an infinite version of (5). It's a random variable that returns our best estimate of $X$, given answers to all the questions "$\omega\in A$?" for $A\in\mathcal F$. If that answer is always just the same as $X$, then $X$ is $\mathfrak F$-measurable or "known at time $t$".