Suppose the dataset is $\mathcal{D} = \{X_1, \dots, X_n\}$ where each data point $X_i$ is drawn i.i.d. from some distribution $f_X$. The true risk is:
$$R(h) = E_{X \sim f_X}[\mathcal{L}(X, h(X))]$$
Show that $E_{\mathcal{D}_n}[R_e(h)] = R(h)$
- Start with the LHS:
$$E_{\mathcal{D}_n}[R_e(h)]$$
- Plug in the expression for the empirical risk $R_e(h)$:
$$= E_{\mathcal{D}_n} \left [
\frac{1}{n} \sum_{i=1}^n \mathcal{L}(X_i, h(X_i))
\right ]$$
- By linearity of expectation:
$$= \frac{1}{n} \sum_{i=1}^n E_{\mathcal{D}_n}[\mathcal{L}(X_i, h(X_i))]$$
- Because $\mathcal{L}(X_i, h(X_i))$ only depends on $X_i$, the joint expectation (over datasets) is equal to the marginal expectation (over data point $X_i$):
$$= \frac{1}{n} \sum_{i=1}^n E_{X_i}[\mathcal{L}(X_i, h(X_i))]$$
- The expected value is the same for all $X_i$ because they're identically distributed. So, we can replace $X_i$ with a generic variable $X$ drawn from the same distribution $f_X$:
$$= \frac{1}{n} \sum_{i=1}^n E_{X \sim f_X}[\mathcal{L}(X, h(X))]$$
- Simplify:
$$= E_{X \sim f_X}[\mathcal{L}(X, h(X))]$$
This is equal to the true risk $R(h)$.
Alternative
Here's an equivalent way of proceeding, starting after step (3) above.
Explicitly write out the expected value over datasets. Because the data points are independent, the joint distribution of the dataset is equal to the product of the marginal distributions of the data points.
$$= \frac{1}{n} \sum_{i=1}^n \int \cdots \int
\left ( \prod_{j=1}^n f_X(x_j) \right )
\mathcal{L}(x_i, h(x_i))
\ dx_1 \cdots dx_n$$
Reorder the integrals (see Fubini's theorem) and pull terms involving $x_i$ to the outside:
$$= \frac{1}{n} \sum_{i=1}^n
\int f_X(x_i) \mathcal{L}(x_i, h(x_i)) \left [
\int \cdots \int
\left ( \prod_{j \ne i} f_X(x_j) \right )
\ dx_1 \cdots dx_{i-1} \ dx_{i+1} \cdots dx_n
\right ] dx_i$$
The expression inside the brackets is simply integrating a distribution, so it's equal to one:
$$= \frac{1}{n} \sum_{i=1}^n
\int f_X(x_i) \mathcal{L}(x_i, h(x_i)) dx_i$$
The integral is the expected value of $\mathcal{L}(\cdots)$ with respect to $f_X$:
$$= \frac{1}{n} \sum_{i=1}^n
E_{X \sim f_X}[\mathcal{L}(X, h(X))]$$
This is the same as the result of step (5) above, so proceed to (6).
Best Answer
You don't need to assume $n$ to be arbitrarily large. It's just linearity of expectation:$\DeclareMathOperator{\E}{\mathbb E}$ \begin{align} \E[\text{empirical risk}] &= \E\left[ \frac1n \sum_{i=1}^n l(f(x_i), z_i) \right] \\&= \frac1n \sum_{i=1}^n \E\left[ l(f(x_i), z_i) \right] \\&= \frac1n \sum_{i=1}^n \int l(f(x_i), z_i) \,\mathrm{d}P(x_i, z_i) \\&= \frac1n \sum_{i=1}^n \text{true risk} \\&= \text{true risk} .\end{align}