Let $(\Omega, \mathcal F, \mathbb P)$ be a probability space. Let $(X,Y)$ be random vector with probability density function $g_{(X,Y)}$. Finally, let $f$ be any borel function, such that $\mathbb E[|f(X,Y)|] < \infty$. Then it holds: $\mathbb E[f(X,Y)|Y] = h(Y)$, where:

$$ h(y) = \frac{\int_{\mathbb R} f(x,y)g_{(X,Y)}(x,y)dx}{\int_{\mathbb R} g_{(X,Y)}(x,y)dx} $$when $\int_{\mathbb R} g_{(X,Y)}(x,y)dx \neq 0$, and $h(y) = 0$ otherwise.

Firstly, we can put $0$ in the second case, because the set $S=\{ \omega \in \Omega : \int_{\mathbb R} g_{(X,Y)}(x,Y(\omega)) dx = 0 \}$ has measure $0$. Clearly $\mathbb P(S) = \mathbb P(Y \in S_Y)$, where $S_Y = \{ y \in \mathbb R: \int_{\mathbb R} g_{(X,Y)}(x,y)dx = 0 \}$. Then $\mathbb P(Y \in S_Y) = \int_{S_Y} g_Y(y) dy $, where $g_Y$ is marginal density (It exists due to Fubini + existence of joint density of rv $(X,Y)$ ). But note $g_Y(y) = \int_{\mathbb R} g_{(X,Y)}(x,y)dx $, so we're just integrating $0$ function (cause we on $S_Y$ where it's $0$), so $\mathbb P(S) = 0$. This + the fact that Conditional Expectation is up to the set of measure $0$ allows us to forget about the case when $g_Y(y) = 0$.

So, we have to prove $2$ things:

1) $h(Y)$ is $\sigma(Y)$ measurable. Clearly both $\int_{\mathbb R} g_{(X,Y)}(x,Y) dx$ and $\int_{\mathbb R} g_{(X,Y)}(x,Y) f(x,Y) dx$ are $\sigma(Y)$ measurable due to Fubinii theorem (integrals of $\sigma(Y) -$ measurable functions are $\sigma(Y)$ measurable (We here used the fact that $g_{(X,Y)}$ is bounded and $\mathbb E[f(X,Y)]$ is finite to be able to apply Fubini's theorem.

2) For any $A \in \sigma(Y)$ we have to show $\int_A f(X,Y) d\mathbb P = \int_A h(Y) d\mathbb P$. Note that $A$ is of the form $Y^{-1}(B)$ where $B \in \mathcal B(\mathbb R)$ (borel set).

Note that $\int_A f(X,Y) d\mathbb P = \mathbb E[ f(X,Y) \cdot \chi_{_{Y \in B}} ]$ and $\int_A h(Y) d\mathbb P = \mathbb E[ h(Y) \cdot \chi_{_{Y \in B}} ]$

We'll use the fact, that if random variable/vector (in $\mathbb R^n$) $V$ has density function $g_V$, then for any borel function $\phi: \mathbb R^n \to \mathbb R^n$, we have $\mathbb E[\phi(V)] = \int_{\mathbb R^n} \phi(v) g_V(v) d\lambda_n(v)$.

Then: $$\mathbb E[ f(X,Y) \cdot \chi_{_{Y \in B}} ] = \int_{\mathbb R^2} f(x,y)\chi_{_{B}} g_{(X,Y)}(x,y) d\lambda_2(x,y) = \int_{B} \int_{\mathbb R} f(x,y)g_{(X,Y)}(x,y)dxdy $$

That last split of integrals due to fubinii (function is integrable due to our assumption with $f$ ).

And now similarly at the beggining:

$$ \mathbb E[ h(Y) \cdot \chi_{_{Y \in B}} ] = \int_{B} h(y) (\int_{\mathbb R} g_{(X,Y)}(x,y)dx)dy$$

Now due to our assumption of $h$ (that is getting rid of that case when denominator is $0$ due to its being $0$-measurable set). We have:

$$ \int_{B} (h(y)) (\int_{\mathbb R} g_{(X,Y)}(x,y)dx) dy = \int_{B} (\frac{\int_{\mathbb R} g_{(X,Y)}(x,y)f(x,y)dx}{\int_{\mathbb R} g_{(X,Y)}(x,y)dx}) (\int_{\mathbb R} g_{(X,Y)}(x,y)dx )dy$$

After simplification we get $\mathbb E[ h(Y) \cdot \chi_{_{Y \in B}} ] = \int_{B} \int_{\mathbb R} f(x,y)g_{(X,Y)}(x,y)dxdy = \mathbb E[ f(X,Y) \cdot \chi_{_{Y \in B}} ]$, what we wanted to prove.

Now your "definition $1$" follows when you take $f(x,y) = x$. Then $ h(y) = \mathbb E[X|Y=y] $

## Best Answer

Your approach was headed in the right direction, as others have pointed out. The key is to keep track of the support of the distributions. In this regard, it helps to use indicator functions or something similar while writing densities.

Here's another way of looking at the problem:

Observe that joint density of $(X,Y)$ can be factored as

$$f_{X,Y}(x,y)=e^{-y}\mathbf1_{0<x<y}=\underbrace{e^{-(y-x)}\mathbf1_{y>x}}_{f_{Y\mid X}(y\mid x)}\cdot \underbrace{e^{-x}\mathbf1_{x>0}}_{f_X(x)}$$

From the conditional density $f_{Y\mid X}(y\mid x)$, it is clear that $Y-x$ given $X=x$ has a standard exponential distribution. As this conditional distribution is free of $x$, the (unconditional) distribution of $Y-X$ is also standard exponential. In other words, $Y-X$ and $X$ are independent and identically distributed.

Therefore, $$E\left[X+Y\mid Y-X\right]=E\left[X\mid Y-X\right]+E\left[Y\mid Y-X\right]=E\left[X\right]+E\left[Y\mid Y-X\right]$$

So only need to find the conditional distribution of $Y$ given $Y-X$.

For the transformation $(x,y)\mapsto (y,y-x)=(u,v)$, the absolute value of jacobian is unity. Hence the joint density of $(Y,Y-X)$ is

$$f_{Y,Y-X}(u,v)=f_{X,Y}(u-v,u)=e^{-u}\mathbf1_{0<u-v<u}$$

So that conditional density of $Y$ given $Y-X$ is

$$f_{Y\mid Y-X}(u\mid v)=\frac{f_{Y,Y-X}(u,v)}{f_{Y-X}(v)}=\frac{e^{-u}\mathbf1_{0<u-v<u}}{e^{-v}\mathbf1_{v>0}}=e^{-(u-v)}\mathbf1_{u>v>0}$$

This is a shifted exponential distribution with shift $v$, so that for every $v>0$, $$E\left[Y\mid Y-X=v \right]=1+v$$

Finally, $$E\left[X+Y\mid Y-X\right]=2+Y-X \quad,\text{ a.e. }$$

Edit:

Actually a clever manipulation uses the independence of $Y-X$ and $X$ to good effect. Thanks to @r.e.s for spotting this:

\begin{align} E\left[X+Y\mid Y-X\right]&=E\left[2X+Y-X\mid Y-X \right] \\&=2E\left[X \mid Y-X\right]+E\left[Y-X\mid Y-X\right] \\&=2E\left[X\right]+Y-X \\&=2+Y-X \end{align}