In an expression where more than one random variables are involved, the symbol $E$ alone does not clarify with respect to which random variable is the expected value "taken". For example
$$E[h(X,Y)] =\text{?} \int_{-\infty}^{\infty} h(x,y) f_X(x)\,dx$$
or
$$E[h(X,Y)] = \text{?} \int_{-\infty}^\infty h(x,y) f_Y(y)\,dy$$
Neither. When many random variables are involved, and there is no subscript in the $E$ symbol, the expected value is taken with respect to their joint distribution:
$$E[h(X,Y)] = \int_{-\infty}^\infty \int_{-\infty}^\infty h(x,y) f_{XY}(x,y) \, dx \, dy$$
When a subscript is present... in some cases it tells us on which variable we should condition. So
$$E_X[h(X,Y)] = E[h(X,Y)\mid X] = \int_{-\infty}^\infty h(x,y) f_{h(X,Y)\mid X}(h(x,y)\mid x)\,dy $$
Here, we "integrate out" the $Y$ variable, and we are left with a function of $X$.
...But in other cases, it tells us which marginal density to use for the "averaging"
$$E_X[h(X,Y)] = \int_{-\infty}^\infty h(x,y) f_{X}(x) \, dx $$
Here, we "average over" the $X$ variable, and we are left with a function of $Y$.
Rather confusing I would say, but who said that scientific notation is totally free of ambiguity or multiple use? You should look how each author defines the use of such symbols.
The formal definition of conditional expectation is that $E[Y|\mathcal{F}_1]$ is any random variable measurable with respect to $\mathcal{F}_1$ having the property that
$$\int_F E[Y|\mathcal{F}_1](\omega)d\mathbb{P}(\omega) = \int_F Y(\omega) d\mathbb{P}(\omega)$$
for all $\mathcal{F}_1$-measurable sets $F$.
In the present case, this definition invites us to inspect all the measurable subsets $F$ with respect to $\mathcal{F}_1$, which you already computed in the first problem. The trick is to begin with the smallest, most basic $\mathcal{F}_1$-measurable sets (apart from the empty set), which are $\{HH, HT\}$ and $\{TH, TT\}$. Although we don't yet know $E[Y|\mathcal{F}_1]$, we can use the right hand side to compute its integrals. Because neither of these events can be decomposed (nontrivially) into smaller ones, the conditional expectation must have a constant value on each one. For example, writing
$$E[Y|\mathcal{F}_1](HH) =E[Y|\mathcal{F}_1](HT) = z,$$
the definition gives
$$\eqalign{
zp &= zp^2 + zp(1-p) \\
&=E[Y|\mathcal{F}_1](HH) \mathbb{P}(HH) +E[Y|\mathcal{F}_1](HT) \mathbb{P}(HT)\\
&=\int_{\{HH, HT\}} E[Y|\mathcal{F}_1](\omega)d\mathbb{P}(\omega)\\
&= \int_{\{HH, HT\}} Y(\omega) d\mathbb{P}(\omega)\\
&= Y(HH)\mathbb{P}(HH) + Y(HT)\mathbb{P}(HT) \\
&= 2p^2 + 1p(1-p)= p+p^2,
}$$
whence we deduce
$$z = \frac{p+p^2}{p} = 1 + p.$$
A similar calculation for $F = \{TH, TT\}$ (do it!) establishes that
$$E[Y|\mathcal{F}_1](TH) =E[Y|\mathcal{F}_1](TT) = p.$$
There is a simple intuition to support these abstract calculations: $\mathcal{F}_1$ records the information available after flipping the coin the first time. If it comes up heads, the possible events (which have only partially occurred!) are $HH$ and $HT$. We already have one head and there is a chance $p$ that the second flip will be a head. Thus, at this stage, our expectation of $Y$ equals $1$ (for what has happened) plus $1\times p$ (for what could happen yet), summing to $1+p$. If instead the first flip is tails, we have seen no heads yet but there is still a chance of $p$ of seeing a head on the second flip: the expectation of $Y$ is just $0 + 1\times p = p$ in that case.
As a check, we can compute
$$\eqalign{
E[E[Y|\mathcal{F}_1]] &= \int_\Omega E[Y|\mathcal{F}_1](\omega)d\mathbb{P}(\omega) \\
&= (1+p)\mathbb{P}(E[Y|\mathcal{F}_1]=1+p) + (p)\mathbb{P}(E[Y|\mathcal{F}_1]=p)\\
& = (1+p)\mathbb{P}(\{HH, HT\}) + (p) \mathbb{P}(\{TH, TT\})\\
&= (1+p)(p^2 + p(1-p)) + p((1-p)p + (1-p)^2)\\
&= 2p,
}$$
exactly as before.
It should be clear that this is just a laborious way of expressing the idea that there is a $p$ chance at the outset of heads--which has a conditional expectation of $1+p$-- and a $1-p$ chance of tails--which has a conditional expectation of $p$: everything reduces to the elementary calculation of conditional expectations, which needs no sigma fields or integrals. The point of this exercise is to build on that intuition to develop an understanding that will hold up when these sigma algebras get much, much more complicated.
Best Answer
The calligraphic "$\mathcal{X}$" describes the collection of all the $x_i$-values in your dataset, and the calligraphic "$\mathcal{Y}$" all the $y_i$-values. So, while both $\mathcal{X} = (x_1,\ldots,x_N)^t$ and $\mathcal{Y} = (y_1,\ldots, y_N)^t$ are random vectors, the $\mathcal{X}$ are considered fixed and for the $\mathcal{Y}$ the conditional PDF $p(\mathcal Y | \mathcal X)$ is considered. And this conditional PDF is given by the equation in the exercise: $y_i = f(x_i) + \varepsilon_i$.
And since $\hat f(x_0)$ is a function of $\mathcal Y$: $$ \hat f(x_0) = \sum_{i=1}^N \ell_i(x_0; \mathcal X)y_i, $$ the expectation $E_{\mathcal{Y|X}}(f(x_0) - \hat{f}(x_0))^2$ is well defined. Note, that $x_0$ is not necessarily an element of $\mathcal X$.