$EPE = \int\int {(y-g(x))^2f(x,y)dxdy}$
By Bayes' Theorem $f(x,y)=f(y\,|\,x)\,f(x)$ we have:
$EPE = \int\int {(y-g(x))^2f(y\,|\,x)\,f(x)dxdy}$
Rearranging gives:
$EPE = \int f(x)\;\left(\,\int (y-g(x))^2f(y\,|\,x)dy\,\right)\;dx$
Using definition of $E_x$ we get:
$EPE = E_x(\;\int (y-g(x))^2f(y\,|\,x)dy\;) $
Using definition of $E_{Y\,|\,X}$ we get:
$EPE = E_x(\;\;\;E_{Y\,|\,X}(\,(Y-g(X))^2\,|\,X\,)\;\;\;) $
Or an even shorter notation:
$EPE = E_xE_{Y\,|\,X}(\,[Y-g(X)]^2\,|\,X\,) $
The notation in that discussion is thoroughly confusing. In the derivation of $EPE(x_0)$ it's unclear what $E_{y_0|x_0}E_{\cal T}$ means. I don't see it as representing a calculation like $E(Z)=E_W(E(Z\mid W))$ by first conditioning on $W$, unless there is some inconsistency in notation somewhere.
IMHO a less confusing development would be to write
$$
PE:=(y_0-\hat y_0)^2 = [(y_0-x_0^T\beta) + (x_0^T\beta - \hat y_0)]^2=:(A-B)^2\tag1,
$$
say, and then take expectations conditional on the test point $x_0$ and training data $\cal T$. Under this assumption, the terms $A$ and $B$ are independent (since the first depends on the error $\varepsilon$ on the new observation $y_0$ and the second depends on the errors on the training data) and $A$ has expectation zero (since the error $\varepsilon$ has mean zero). So the cross term $AB$ has expectation zero and the (conditional) EPE is then
$$
E(PE\mid x_0,{\cal T}) = E(A^2\mid x_0,{\cal T}) + E(B^2\mid x_0,{\cal T}).\tag2$$
We note that the first term on the RHS is $\sigma^2$ and the second is $V(\hat y_0\mid x_0,\cal T)$ since $\hat y_0$ is an unbiased estimator of $x_0^T\beta$, so (2) becomes
$$
E(PE\mid x_0,{\cal T})=\sigma^2+x_0^T(X^TX)^{-1}x_0\sigma^2.\tag3
$$
Then take conditional expectation of (3) wrt $x_0$ to get
$$
E(PE\mid x_0)=\sigma^2+x_0^TE_{\cal T}[(X^TX)^{-1}]x_0\sigma^2,
$$
which is the final formula in (2.27).
Best Answer
Given $X=x$, you can always form a function which can have arbitrary value $f(x)$ at $x$. Note that the function need not be differentiable at $x$. That's why the author replaced $f(x)$ with $c$ which and minimized the expression over $c$ to get the optimal value of $f(x)$ that minimizes the loss when $X=x$.
Please comment if you want me to show how the authors derived the optimal value $f(x) = E(Y|X=x)$ from the minimization in second expression.