(1)
$$\Lambda(t):=\int_{\{\omega:Y(\omega)\leq t\}}XdP=\int_{\{\omega:Y(\omega)\leq t\}}E(X|Y)dP$$
is absolutely continuous with respect to $P\circ Y^{-1}$ on $(\mathbb{R},\mathscr{B}(\mathbb{R}))$, so we take the derivative as $\lambda(t)$.
Consider $\lambda\circ Y$. $\forall G\in\mathcal{G}$,
$$\int_{G}\lambda\circ YdP=\int_{Y(G)}\lambda d(P\circ Y^{-1})=\int_{Y(G)}d\Lambda=\int_GE(X|Y)dP.$$
Since $E(X|Y)$ and $\lambda\circ Y$ are both $\mathcal{G}$-measurable, $\lambda\circ Y\overset{a.s.}{=}E(X|Y).$
(2)
$$E_Y(E(X|Y))=\int_{-\infty}^{+\infty}\lambda(y) [P\circ Y^{-1}](dy)=\int_\Omega \lambda\circ YdP=\int_\Omega E(X|Y)dP=\int_\Omega XdP.$$
If you see $E(X|Y)$ as a function of $Y$, and $Y$ has a density function, then
$$\int_{-\infty}^{+\infty}E(X|Y=y)f_Y(y)dy=\int_{-\infty}^{+\infty}\lambda(y)f_Y(y)dy=\int_{-\infty}^{+\infty}\lambda(y) [P\circ Y^{-1}](dy)=E(X).$$
Let $(\Omega, \mathcal F, \mathbb P)$ be a probability space. Let $(X,Y)$ be random vector with probability density function $g_{(X,Y)}$. Finally, let $f$ be any borel function, such that $\mathbb E[|f(X,Y)|] < \infty$. Then it holds: $\mathbb E[f(X,Y)|Y] = h(Y)$, where:
$$ h(y) = \frac{\int_{\mathbb R} f(x,y)g_{(X,Y)}(x,y)dx}{\int_{\mathbb R} g_{(X,Y)}(x,y)dx} $$when $\int_{\mathbb R} g_{(X,Y)}(x,y)dx \neq 0$, and $h(y) = 0$ otherwise.
Firstly, we can put $0$ in the second case, because the set $S=\{ \omega \in \Omega : \int_{\mathbb R} g_{(X,Y)}(x,Y(\omega)) dx = 0 \}$ has measure $0$. Clearly $\mathbb P(S) = \mathbb P(Y \in S_Y)$, where $S_Y = \{ y \in \mathbb R: \int_{\mathbb R} g_{(X,Y)}(x,y)dx = 0 \}$. Then $\mathbb P(Y \in S_Y) = \int_{S_Y} g_Y(y) dy $, where $g_Y$ is marginal density (It exists due to Fubini + existence of joint density of rv $(X,Y)$ ). But note $g_Y(y) = \int_{\mathbb R} g_{(X,Y)}(x,y)dx $, so we're just integrating $0$ function (cause we on $S_Y$ where it's $0$), so $\mathbb P(S) = 0$. This + the fact that Conditional Expectation is up to the set of measure $0$ allows us to forget about the case when $g_Y(y) = 0$.
So, we have to prove $2$ things:
1) $h(Y)$ is $\sigma(Y)$ measurable. Clearly both $\int_{\mathbb R} g_{(X,Y)}(x,Y) dx$ and $\int_{\mathbb R} g_{(X,Y)}(x,Y) f(x,Y) dx$ are $\sigma(Y)$ measurable due to Fubinii theorem (integrals of $\sigma(Y) -$ measurable functions are $\sigma(Y)$ measurable (We here used the fact that $g_{(X,Y)}$ is bounded and $\mathbb E[f(X,Y)]$ is finite to be able to apply Fubini's theorem.
2) For any $A \in \sigma(Y)$ we have to show $\int_A f(X,Y) d\mathbb P = \int_A h(Y) d\mathbb P$. Note that $A$ is of the form $Y^{-1}(B)$ where $B \in \mathcal B(\mathbb R)$ (borel set).
Note that $\int_A f(X,Y) d\mathbb P = \mathbb E[ f(X,Y) \cdot \chi_{_{Y \in B}} ]$ and $\int_A h(Y) d\mathbb P = \mathbb E[ h(Y) \cdot \chi_{_{Y \in B}} ]$
We'll use the fact, that if random variable/vector (in $\mathbb R^n$) $V$ has density function $g_V$, then for any borel function $\phi: \mathbb R^n \to \mathbb R^n$, we have $\mathbb E[\phi(V)] = \int_{\mathbb R^n} \phi(v) g_V(v) d\lambda_n(v)$.
Then: $$\mathbb E[ f(X,Y) \cdot \chi_{_{Y \in B}} ] = \int_{\mathbb R^2} f(x,y)\chi_{_{B}} g_{(X,Y)}(x,y) d\lambda_2(x,y) = \int_{B} \int_{\mathbb R} f(x,y)g_{(X,Y)}(x,y)dxdy $$
That last split of integrals due to fubinii (function is integrable due to our assumption with $f$ ).
And now similarly at the beggining:
$$ \mathbb E[ h(Y) \cdot \chi_{_{Y \in B}} ] = \int_{B} h(y) (\int_{\mathbb R} g_{(X,Y)}(x,y)dx)dy$$
Now due to our assumption of $h$ (that is getting rid of that case when denominator is $0$ due to its being $0$-measurable set). We have:
$$ \int_{B} (h(y)) (\int_{\mathbb R} g_{(X,Y)}(x,y)dx) dy = \int_{B} (\frac{\int_{\mathbb R} g_{(X,Y)}(x,y)f(x,y)dx}{\int_{\mathbb R} g_{(X,Y)}(x,y)dx}) (\int_{\mathbb R} g_{(X,Y)}(x,y)dx )dy$$
After simplification we get $\mathbb E[ h(Y) \cdot \chi_{_{Y \in B}} ] = \int_{B} \int_{\mathbb R} f(x,y)g_{(X,Y)}(x,y)dxdy = \mathbb E[ f(X,Y) \cdot \chi_{_{Y \in B}} ]$, what we wanted to prove.
Now your "definition $1$" follows when you take $f(x,y) = x$. Then $ h(y) = \mathbb E[X|Y=y] $
Best Answer
You may be more familiar with conditional probability $$ \mathbb P(A\mid B)=\frac{\mathbb P(A\cap B)}{\mathbb P(B)},\qquad (\star) $$ which is a fundamental concept in statistics and informal probability (and applied in many other situations outside of mathematics and in everyday life...)
Conditional expectation is a vast generalization of conditional probability, where now the set $B$ is replaced by a sigma field (corresponding to your $\mathcal D$) and $A$ is interpreted as an indicator random variable $1_A$, which is then generalized to be an arbitrary random variable $X$. So both $A$ and $B$ are replaced with vastly more general objects.
Now since measure-theoretic probability is mathematically rigorous, you have to worry about "pedantic" situations like when $\mathbb P(B)=0$ and the conditional probability formula $(\star)$ becomes undefined. In fact, this seemingly simple problem is the source of the subtleties in the definition of conditional expectation. This is what causes the uncertainty up to sets of probability $0$.
Okay with that preamble out of the way, I can now answer your questions in a better context.
What we learn from $X$ by replacing it with the random variable variable $\mathbb E(X\mid \mathcal D)$ is its "coarse grained" behavior when averaged over the sets in $\mathcal D$. For instance, in the extreme cases when $\mathcal D=\mathcal F$ there is no extra averaging and $X=\mathbb E(X\mid \mathcal F)$ up to null sets, whereas when $\mathcal D=\{0,\Omega\}$ the conditional expectation $\mathbb E(X\mid \mathcal D)$ becomes equal to the constant $\mathbb EX$, up to a null set. In between these two extremes, you can imagine the sets in $\mathcal D$ as being unions of sets in a partition of the probability space, and the value of the conditional expectation on each "part" of the partition is its expectation when restricted to that "part".
To compute $\mathbb E(X\mid \mathcal D)$ we need to know $X$ up to $\mathbb P$-null sets, since otherwise the right side of your equation (2) cannot be computed. That's the answer to the literal interpretation of your question, but I think more in the spirit of what you are asking is to understand when $\mathbb E(X\mid \mathcal D)=\mathbb E(Y\mid \mathcal D)$ up to null sets. By subtracting the two sides, this is equivalent to asking when $\mathbb E(Z\mid \mathcal D)=0$, and the answer is that it will happen whenever $Z$ has mean $0$ when restricted to any set in $\mathcal D$.
A "realization" means, in this context, a representative of an equivalence class of measurable functions that are equal up to null sets. In this case, the definition of conditional expectation does not actually identify a unique random variable $\mathbb E(X\mid \mathcal D)$, but it gives conditions on such a random variable. It turns out with a little work, one can show that while there are many random variables satisfying these conditions, they all belong to the same equivalence class. Thus the equivalence class is uniquely defined, and a "realization" (also called a "version") is any element of this equivalence class.
I remember when I first learned measure-theoretic probability, conditional expectation was the hardest concept for me to understand. Even after I understood the definition well, it still took me some time to gain a good intuition for it. The book I learned it from was PTE, and the example that finally made things "click" for me was example 4.1.5 on page 208 (page numbers accurate as of Version 5, January 11, 2019).