Let $(\Omega, \mathcal F, \mathbb P)$ be a probability space. Let $(X,Y)$ be random vector with probability density function $g_{(X,Y)}$. Finally, let $f$ be any borel function, such that $\mathbb E[|f(X,Y)|] < \infty$. Then it holds: $\mathbb E[f(X,Y)|Y] = h(Y)$, where:
$$ h(y) = \frac{\int_{\mathbb R} f(x,y)g_{(X,Y)}(x,y)dx}{\int_{\mathbb R} g_{(X,Y)}(x,y)dx} $$when $\int_{\mathbb R} g_{(X,Y)}(x,y)dx \neq 0$, and $h(y) = 0$ otherwise.
Firstly, we can put $0$ in the second case, because the set $S=\{ \omega \in \Omega : \int_{\mathbb R} g_{(X,Y)}(x,Y(\omega)) dx = 0 \}$ has measure $0$. Clearly $\mathbb P(S) = \mathbb P(Y \in S_Y)$, where $S_Y = \{ y \in \mathbb R: \int_{\mathbb R} g_{(X,Y)}(x,y)dx = 0 \}$. Then $\mathbb P(Y \in S_Y) = \int_{S_Y} g_Y(y) dy $, where $g_Y$ is marginal density (It exists due to Fubini + existence of joint density of rv $(X,Y)$ ). But note $g_Y(y) = \int_{\mathbb R} g_{(X,Y)}(x,y)dx $, so we're just integrating $0$ function (cause we on $S_Y$ where it's $0$), so $\mathbb P(S) = 0$. This + the fact that Conditional Expectation is up to the set of measure $0$ allows us to forget about the case when $g_Y(y) = 0$.
So, we have to prove $2$ things:
1) $h(Y)$ is $\sigma(Y)$ measurable. Clearly both $\int_{\mathbb R} g_{(X,Y)}(x,Y) dx$ and $\int_{\mathbb R} g_{(X,Y)}(x,Y) f(x,Y) dx$ are $\sigma(Y)$ measurable due to Fubinii theorem (integrals of $\sigma(Y) -$ measurable functions are $\sigma(Y)$ measurable (We here used the fact that $g_{(X,Y)}$ is bounded and $\mathbb E[f(X,Y)]$ is finite to be able to apply Fubini's theorem.
2) For any $A \in \sigma(Y)$ we have to show $\int_A f(X,Y) d\mathbb P = \int_A h(Y) d\mathbb P$. Note that $A$ is of the form $Y^{-1}(B)$ where $B \in \mathcal B(\mathbb R)$ (borel set).
Note that $\int_A f(X,Y) d\mathbb P = \mathbb E[ f(X,Y) \cdot \chi_{_{Y \in B}} ]$ and $\int_A h(Y) d\mathbb P = \mathbb E[ h(Y) \cdot \chi_{_{Y \in B}} ]$
We'll use the fact, that if random variable/vector (in $\mathbb R^n$) $V$ has density function $g_V$, then for any borel function $\phi: \mathbb R^n \to \mathbb R^n$, we have $\mathbb E[\phi(V)] = \int_{\mathbb R^n} \phi(v) g_V(v) d\lambda_n(v)$.
Then: $$\mathbb E[ f(X,Y) \cdot \chi_{_{Y \in B}} ] = \int_{\mathbb R^2} f(x,y)\chi_{_{B}} g_{(X,Y)}(x,y) d\lambda_2(x,y) = \int_{B} \int_{\mathbb R} f(x,y)g_{(X,Y)}(x,y)dxdy $$
That last split of integrals due to fubinii (function is integrable due to our assumption with $f$ ).
And now similarly at the beggining:
$$ \mathbb E[ h(Y) \cdot \chi_{_{Y \in B}} ] = \int_{B} h(y) (\int_{\mathbb R} g_{(X,Y)}(x,y)dx)dy$$
Now due to our assumption of $h$ (that is getting rid of that case when denominator is $0$ due to its being $0$-measurable set). We have:
$$ \int_{B} (h(y)) (\int_{\mathbb R} g_{(X,Y)}(x,y)dx) dy = \int_{B} (\frac{\int_{\mathbb R} g_{(X,Y)}(x,y)f(x,y)dx}{\int_{\mathbb R} g_{(X,Y)}(x,y)dx}) (\int_{\mathbb R} g_{(X,Y)}(x,y)dx )dy$$
After simplification we get $\mathbb E[ h(Y) \cdot \chi_{_{Y \in B}} ] = \int_{B} \int_{\mathbb R} f(x,y)g_{(X,Y)}(x,y)dxdy = \mathbb E[ f(X,Y) \cdot \chi_{_{Y \in B}} ]$, what we wanted to prove.
Now your "definition $1$" follows when you take $f(x,y) = x$. Then $ h(y) = \mathbb E[X|Y=y] $
I assume that you're familiar with the usual way to do answer these questions (by developing conditional distributions using Bayes' rule, as covered in a standard non-measure theoretic probability class), and want to do this systematically to develop understanding for formal definition of conditional expectation. This is definitely a useful exercise to do at least once. However, on the off chance that you are not familiar with these basic techniques, I'd strongly recommend reviewing them from an undergrad probability book first, because the following is overly cumbersome for the task at hand.
So, let's work out $Z =\mathbb{E}[X|\mathcal{G}]$ closely. Since the state space is finite, this amounts to finding the six values $Z(\omega)$ for $\omega \in [1:6].$ Now, we know that $Z$ has to be $\mathcal{G}$-measurable. So, for any value $v$, the set $\{\omega: Z(\omega) = v\}$ must lie in $\mathcal{G}$. [1]
We first use this measurability condition to determine the structure of $Z$. For the sake of contradiction, suppose that $Z(1) \neq Z(3)$. Then the set $\{\omega: Z(\omega) = Z(1)\}$ does not contain $3$, but contains $1$. But any set in $\mathcal{G}$ that has a $1$ also has a $3$, giving a contradiction. This means that $Z(1) = Z(3)$. Similarly, we have $Z(4) = Z(6)$.
(More generally - the point is that $\mathcal{G}$ is generated by the four sets $G_1 = \{1,3\}, G_2 = \{2\}, G_3 = \{4,6\}, G_4 = \{5\}$. So, any random variable measurable with respect to it must be a simple function of the form $\sum a_i \mathbf{1}_{G_i}.$ You should try to show this.)
Now, taking various choices of $G \in \mathcal{G},$ we can further resolve the values of $Z$ using the expectation properties. For instance, note that for $G = \{2\},$ $$ \mathbb{E}[Z\mathbf{1}_{\{2\}}] = \mathbb{E}[X\mathbf{1}_{\{2\}}] \iff Z(2) P(2) = X(2) P(2) \iff Z(2) = X(2).$$
Similarly, taking $G = \{1,3\},$ we find that $$ Z(1)P(1) + Z(3)P(3) = Z(1)(P(1) + P(3)) = X(1)P(1) + X(3) P(3) \\ \iff Z(1) = \frac{X(1) P(1) + X(3) P(3)}{P(1) + P(3)}, $$ where we have used our previous observation that $Z(1) = Z(3)$.
As an exercise, set up similar calculations to determine $Z(4), Z(5), Z(6)$.
Finally, if we evaluate the numbers by setting $X(\omega) = \omega, P(\omega) = 1/6,$ we'll find that $$ Z(1) = Z(2) = Z(3) = 2,\\ Z(4) = Z(5) = Z(6) = 5.$$
[1] More generally, we would work with Borel sets and consider $Z^{-1}(B) = \{\omega: Z(\omega) \in B\}$, but in this case the discreteness makes things easier.
Best Answer
(1) $$\Lambda(t):=\int_{\{\omega:Y(\omega)\leq t\}}XdP=\int_{\{\omega:Y(\omega)\leq t\}}E(X|Y)dP$$
is absolutely continuous with respect to $P\circ Y^{-1}$ on $(\mathbb{R},\mathscr{B}(\mathbb{R}))$, so we take the derivative as $\lambda(t)$.
Consider $\lambda\circ Y$. $\forall G\in\mathcal{G}$, $$\int_{G}\lambda\circ YdP=\int_{Y(G)}\lambda d(P\circ Y^{-1})=\int_{Y(G)}d\Lambda=\int_GE(X|Y)dP.$$ Since $E(X|Y)$ and $\lambda\circ Y$ are both $\mathcal{G}$-measurable, $\lambda\circ Y\overset{a.s.}{=}E(X|Y).$
(2) $$E_Y(E(X|Y))=\int_{-\infty}^{+\infty}\lambda(y) [P\circ Y^{-1}](dy)=\int_\Omega \lambda\circ YdP=\int_\Omega E(X|Y)dP=\int_\Omega XdP.$$ If you see $E(X|Y)$ as a function of $Y$, and $Y$ has a density function, then
$$\int_{-\infty}^{+\infty}E(X|Y=y)f_Y(y)dy=\int_{-\infty}^{+\infty}\lambda(y)f_Y(y)dy=\int_{-\infty}^{+\infty}\lambda(y) [P\circ Y^{-1}](dy)=E(X).$$