The expectation of a random variable X is a function of X. Now, functions of random variables are random variables themselves. Then is the expectation of a random variable itself a random variable?
[Math] Is the expectation of a random variable itself a random variable
expected valueprobability theory
Related Solutions
Here's a careful derivation of the formula in Gautam Shenoy's answer:
If $X$ is a non-negative random variable, this well-known result: $$ \mathrm E(X)=\int_0^{+\infty}\mathrm P(X\gt t)\,\mathrm dt=\int_0^{+\infty}\mathrm P(X\geqslant t)\,\mathrm dt\tag1 $$ expresses the expectation of $X$ in terms of its CDF: $$ \mathrm E(X)=\int_0^{+\infty}[1 - F(t)]\,\mathrm dt\tag2 $$ To extend (2) to the general case where $X$ may take negative values, we can write $$E(X)=E(X^+)-E(X^-)\tag3$$ where the positive part and negative part of $X$ are defined by $$ X^+:=\begin{cases} X& \text{if $X>0$}\\ 0&\text{otherwise}\\ \end{cases}\tag4 $$ and $$ X^-:=\begin{cases} -X& \text{if $X<0$}\\ 0&\text{otherwise}\\ \end{cases}.\tag5 $$ Since both $X^+$ and $X^-$ are nonnegative, we can apply (1). Observe that for every $t>0$ $$ P(X^+>t)=P(X>t)=1-F(t)\tag6 $$ and $$P(X^-\ge t)=P(X\le -t)=F(-t).\tag7$$ Plugging these into (1) and using (3) gives $$ E(X)=\int_0^\infty[1-F(t)]dt-\int_0^\infty F(-t)dt.\tag8 $$ After a change of variable in the second integral we obtain the equivalent $$ E(X)=\int_0^\infty[1-F(t)]dt-\int_{-\infty}^0 F(t)dt.\tag9 $$
The short answer is you take the answer for each individual choice for the function $f$, then average over them all. How does the long answer flesh that out? Well, it depends on which $f$ are options. (It might help if you specified which paper you read.)
For example, if you know $f$ up to a finite number of real-valued parameters, $E_x[f]$ becomes a function of such parameters, which can be averaged over their distribution. That's well within the purview of what you've learned (unless you haven't yet learned multiple integrals such as $\int_{\mathbb{R}^2}g(u,\,v)dudv$, but even then you could think through the $1$-parameter case).
If on the other hand $f$ can roam through a family of functions with infinitely many degrees of freedom, you need something called functional integration. As a branch of mathematics, it's actually still not very well-understood except for some (admittedly very useful) special cases. However, if you try learning it you'll find most of the relevant intuitions have counterparts in the "normal" calculus you've already covered. For example, you can think of a function as an infinite-dimensional vector, with one component per value of its argument(s). Once you learn how to integrate $\exp -ax^2+Jx$ on $\mathbb{R}$, and $\exp -x^TAx+J^Tx$ on $\mathbb{R}^n$ with a matrix $A$ and vector $J$, the functional equivalent promotes $x,\,A,\,J$ to functions with $1,\,2,\,1$ arguments respectively, where the "dot product" is an integral.
Best Answer
Given a sample space $\Omega$, a random variable (in typical formulations of probability theory) is basically just a function $\Omega\to\mathbb R$. (Really, we have a lot more flexibility than just $\mathbb R$.) That is, $X:\Omega\to\mathbb R$. The reason $f(X)$ is a random variable for $f:\mathbb R\to\mathbb R$ is because this is interpreted simply as $f\circ X:\Omega\to\mathbb R$. (Yes, the notation is ambiguous.) The expectation operation, however, is not a function of real numbers but a function of the random variables themselves. That is, the expectation doesn't take the value of a random variable but the random variable itself. In the discrete case, $\mathbb E[X]=\sum_{\omega\in\Omega}X(\omega)P(\omega)$. In the continuous case this becomes an integral. The expectation of a random variable is a real number, not a random variable. However, every real number induces a random variable, namely a constant function on $\Omega$. This is the content of Kavi Rama Murthy's comment, but it is perhaps a bit misleading.
Conditional expectations can lead to non-trivial (i.e. non-constant) random variables. For example, say that $\Omega=\Omega_1\times\Omega_2$. For concreteness, say that $\Omega_1=\Omega_2=[-10,10]$ (or the integers in that interval if we want to keep things discrete). The $\pi_1 :\Omega_1\times\Omega_2\to\Omega_1$ is a random variable on $\Omega=\Omega_1\times\Omega_2$. We can then talk about the conditional expectation $\mathbb E[X\mid \pi_1 = 3]$, say, which means to consider the expectation of $X$ on the subspace $\{\omega\mid \pi_1(\omega)=3\}\subset\Omega$. This is, again, a constant, but we could now consider the function $\omega_1\mapsto\mathbb E[X\mid \pi_1=\omega_1]:\Omega_1\to\mathbb R$ which would then be a random variable on $\Omega_1$. Often this will be written simply as $\mathbb E[X\mid\pi_1]$, or, in general, $\mathbb E[X\mid Y]$ for an arbitrary random variable $Y$.
Personally, I find that a lot of notation here gets pretty ambiguous. (Consider conditional expectations conditioned on the value of a conditional expectation.) It's also pretty confusing when you don't yet know what it actually means because of the intentional conflation of a random variable with its "value".