The former integral is a Stieltjes integral. See this, for example, in particular the section "Application to probability theory".
In brief, the integral $I_1 = \int {xdF(x)} $ is a generalization of $I_2 = \int {xf(x)dx} $. $I_2$ can be used only when the distribution is absolutely continuous, that is, has a probability density function. $I_1$ can be used for any distribution, even if it is discrete or continuous singular (provided the expectation is well-defined, that is $\int {|x|dF(x)} < \infty $). Note that if $F'(x) = f(x)$, then $\frac{d}{{dx}}F(x) = f(x)$ gives rise to $dF(x) = f(x)dx$.
For a thorough account of this topic, see, for example, this.
EDIT:
An example. Suppose that a random variable $X$ has CDF $F$ such that $F'(x) = f_1 (x)$ for $x < a$, $F'(x) = f_2 (x)$ for $x > a$, and $F(a) - F(a-) = p$, where $f_1$ and $f_2$ are nonnegative continuous functions, and $0 < p \leq 1$. Note that $F$ is everywhere differentiable except at $x=a$ where it has a jump discontinuity of size $p$. In particular, $X$ does not have a PDF $f$, hence you cannot compute ${\rm E}(X)$ as the integral $I_2$. However, ${\rm E}(X)$ can be computed as follows:
$$
{\rm E}(X) = \int_{ - \infty }^\infty {xdF(x)} = \int_{ - \infty }^a {xf_1 (x)dx} + a[F(a) - F(a - )] + \int_a^\infty {xf_2 (x)dx} .
$$
In case $f_1$ and $f_2$ are identically zero and, hence, $p=1$, this gives
$$
{\rm E}(X) = \int_{ - \infty }^\infty {xdF(x)} = a[F(a) - F(a - )] = a,
$$
which is obvious since $X$ has the $\delta_a$ distribution (that is, ${\rm P}(X=a)=1$).
Exercise 1. Suppose that $X$ takes finitely many values $x_1,\ldots,x_n$, with probabilities $p_1,\ldots,p_n$, respectively. Conclude that ${\rm E}(X) = \int_{ - \infty }^\infty {xdF(x)} = \sum\nolimits_{k = 1}^n {x_k p_k } $.
Exercise 2. Suppose that with probability $0 < p < 1$ a random variable $X$ takes the value $a$, and with probability $1-p$ it is uniform$[0,1]$. Find the CDF $F$ of $X$, and compute its expectation (note that $X$ is neither discrete nor continuous random variable).
Note: The integral $I_1$ is, of course, a special case of $\int {h(x)dF(x)} $ (say, for $h$ a continuous function). In particular, letting $h=1$, we have $\int {dF(x)} = 1$ (which is a generalization of $\int {f(x)dx} = 1$).
EDIT: Further details.
Note that if $h$ is continuous, then ${\rm E}[h(X)]$, if exists, is given by
$$
{\rm E}[h(X)] = \int {h(x)dF(x)}.
$$
In particular, if the $n$-th moment exists, it is given by
$$
{\rm E}[X^n] = \int {x^n dF(x)}.
$$
In principle, the limits of integration range from $-\infty$ to $\infty$. In this context, consider the following important example. Suppose that $X$ is any nonnegative random variable. Then its Laplace transform is
$$
{\rm E}[e^{ - sX} ] = \int {e^{ - sx} dF(x)} = \int_{ 0^- }^\infty {e^{ - sx} dF(x)}, \;\; s \geq 0,
$$
where $0^-$ can be replaced by $-\varepsilon$ for any $\varepsilon > 0$. While the $n$-th moment of (the nonnegative) $X$ is given, for any $n \geq 1$, by
$$
{\rm E}[X^n] = \int_{ 0^- }^\infty {x^n dF(x)} = \int_0^\infty {x^n dF(x)},
$$
it is not true in general that also
$$
{\rm E}[e^{ - sX} ] = \int_{0 }^\infty {e^{ - sx} dF(x)}.
$$
Indeed, following the definition of the integral,
$$
{\rm E}[e^{ - sX} ] = \int_{ 0^- }^\infty {e^{ - sx} dF(x)} = e^{-s0}[F(0)-F(0-)] + \int_{ 0 }^\infty {e^{ - sx} dF(x)},
$$
hence the jump of $F$ at zero should be added (if positive, of course). In the $n$-th moment case, on the other hand, the corresponding term is $0^n[F(0)-F(0-)] = 0$, hence a jump of $F$ at zero does not affect the overall integral.
The short answer is you take the answer for each individual choice for the function $f$, then average over them all. How does the long answer flesh that out? Well, it depends on which $f$ are options. (It might help if you specified which paper you read.)
For example, if you know $f$ up to a finite number of real-valued parameters, $E_x[f]$ becomes a function of such parameters, which can be averaged over their distribution. That's well within the purview of what you've learned (unless you haven't yet learned multiple integrals such as $\int_{\mathbb{R}^2}g(u,\,v)dudv$, but even then you could think through the $1$-parameter case).
If on the other hand $f$ can roam through a family of functions with infinitely many degrees of freedom, you need something called functional integration. As a branch of mathematics, it's actually still not very well-understood except for some (admittedly very useful) special cases. However, if you try learning it you'll find most of the relevant intuitions have counterparts in the "normal" calculus you've already covered. For example, you can think of a function as an infinite-dimensional vector, with one component per value of its argument(s). Once you learn how to integrate $\exp -ax^2+Jx$ on $\mathbb{R}$, and $\exp -x^TAx+J^Tx$ on $\mathbb{R}^n$ with a matrix $A$ and vector $J$, the functional equivalent promotes $x,\,A,\,J$ to functions with $1,\,2,\,1$ arguments respectively, where the "dot product" is an integral.
Best Answer
The proof is most likely out of the scope of most undergraduate mathematical statistics/probability courses.
Formally, a random variable is a measurable function from $ \Omega $ to $ E $ where $ (\Omega, \mathcal{F}, P) $ is a probability space and $ (E, \mathcal{E}) $ is a measurable space.
A well-known result in measure-theoretic probability is that if $ X : \Omega \to E $, and $ f : E \to E $ is continuous, then $ f \circ X $ is also a measurable function with respect to the previous probability space. This is the result to which the authors are referring.