Notation of expectation with conditional in subscript

conditional-expectationexpected valueprobability

Inside the book "The Elements of statistical learning", I stumbled upon the following notation (Ex. 2.7)

$$E_{\mathcal{Y|X}}(f(x_0) – \hat{f}(x_0))^2$$
where $\mathcal{X, Y}$ are two random variables representing the training data.

I would like to know how to formally define $E_{\mathcal{Y|X}}$. I know what a condition expectation is, and that some authors are using $E_{Y|X}[Y|X]$ to denote the condition expectation of $Y$ given $X$. However, we don't have a condition expectation in this example, do we?

A second quite common notation is something like $E_X[Y]$. Here, we use the probability distribution of $\mathbb{P}_X$ to compute the expectation.
$$E_X[Y] = \int Y \mathbb{P}_X $$
However $\mathcal{X|Y}$ is not a random variable. So, I really don't know how the define $E_{\mathcal{Y|X}}$.

This question is related to an answer given by Dilip Sarwate
pipes vs commas in expectation notation subscript
Here, a definition of $E_{\mathcal{Y|X}}$ wasn't found.

P.S. Is there any guide for common notation used in ML. I have a rather heavy math background, and I'm often confused by the way how probability theory is applied.

Best Answer

The calligraphic "$\mathcal{X}$" describes the collection of all the $x_i$-values in your dataset, and the calligraphic "$\mathcal{Y}$" all the $y_i$-values. So, while both $\mathcal{X} = (x_1,\ldots,x_N)^t$ and $\mathcal{Y} = (y_1,\ldots, y_N)^t$ are random vectors, the $\mathcal{X}$ are considered fixed and for the $\mathcal{Y}$ the conditional PDF $p(\mathcal Y | \mathcal X)$ is considered. And this conditional PDF is given by the equation in the exercise: $y_i = f(x_i) + \varepsilon_i$.

And since $\hat f(x_0)$ is a function of $\mathcal Y$: $$ \hat f(x_0) = \sum_{i=1}^N \ell_i(x_0; \mathcal X)y_i, $$ the expectation $E_{\mathcal{Y|X}}(f(x_0) - \hat{f}(x_0))^2$ is well defined. Note, that $x_0$ is not necessarily an element of $\mathcal X$.