Expected Value – Taking the Expectation of Taylor Series (Especially the Remainder)

expected valuemathematical-statisticsself-study

My question concerns trying to justify a widely-used method, namely taking the expected value of Taylor Series. Assume we have a random variable $X$ with positive mean $\mu$ and variance $\sigma^2$. Additionally, we have a function, say, $\log(x)$.

Doing Taylor Expansion of $\log X$ around the mean, we get
$$
\log X = \log\mu + \frac{X – \mu}{\mu} – \frac12 \frac{(X-\mu)^2}{\mu^2} + \frac13 \frac{(X – \mu)^3}{\xi_X^3},
$$
where, as usual, $\xi_X$ is s.t. $|\xi_X – \mu| < |X – \mu|$.

If we take an expectation, we will get an approximate equation which people usually refer to as something self-apparent (see the $\approx$ sign in the first equation here):
$$
\mathbb{E}\log X \approx \log \mu – \frac12 \frac{\sigma^2}{\mu^2}
$$

QUESTION: I'm interested in how to prove that the expected value of the remainder term is actually negligible, i.e.
$$
\mathbb{E}\left[\frac{(X – \mu)^3}{\xi_X^3}\right] = o(\sigma^2)
$$
(or, in other words, $\mathbb{E}\bigl[o(X-\mu)^2\bigr] = o\bigl(\mathbb{E}\bigl[(X-\mu)^2\bigr]\bigr)$).

What I tried to do: assuming that $\sigma^2 \to 0$ (which, in turn, means $X \to \mu$ in $\mathbb{P}$), I tried to split the integral into two, surrounding $\mu$ with some $\varepsilon$-vicinity $N_\varepsilon$:
$$
\int_\mathbb{R} p(x)\frac{(x-\mu)^3}{\xi_x^3} \,dx = \int_{x \in N_\varepsilon} \ldots dx + \int_{x \notin N_\varepsilon} \ldots dx
$$

The first one can be bounded due to the fact that $0 \notin N_\varepsilon$ and thus $1/\xi^3$ doesn't bother. But with the second one we have two concurring facts: on the one hand
$$
\mathbb{P}(|X – \mu| > \varepsilon) \to 0
$$
(as $\sigma^2 \to 0$). But on the other hand, we don't know what to do with $1/\xi^3$.

Another possibility could be to try using the Fatou's lemma, but I can't figure out how.

Will appreciate any help or hint. I realize that this is sort of a very technical question, but I need to go through it in order to trust this "Taylor-expectation" method. Thanks!

P.S. I checked out here, but seems it's a bit of another stuff.

Best Answer

You are right to be skeptical of this approach. The Taylor series method does not work in general, although the heuristic contains a kernel of truth. To summarize the technical discussion below,

  • Strong concentration implies that the Taylor series method works for nice functions
  • Things can and will go dramatically wrong for heavy-tailed distributions or not-so-nice functions

As Alecos's answer indicates, this suggests that the Taylor-series method should be scrapped if your data might have heavy tails. (Finance professionals, I'm looking at you.)

As Elvis noted, key problem is that the variance does not control higher moments. To see why, let's simplify your question as much as possible to get to the main idea.

Suppose we have a sequence of random variables $X_n$ with $\sigma(X_n)\to 0$ as $n\to \infty$.

Q: Can we guarantee that $\mathbb{E}[|X_n-\mu|^3] = o(\sigma^2(X_n))$ as $n\to \infty?$

Since there are random variables with finite second moments and infinite third moments, the answer is emphatically no. Therefore, in general, the Taylor series method fails even for 3rd degree polynomials. Iterating this argument shows you cannot expect the Taylor series method to provide accurate results, even for polynomials, unless all moments of your random variable are well controlled.

What, then, are we to do? Certainly the method works for bounded random variables whose support converges to a point, but this class is far too small to be interesting. Suppose instead that the sequence $X_n$ comes from some highly concentrated family that satisfies (say)

$$\mathbb{P}\left\{ |X_n-\mu|> t\right\} \le \mathrm{e}^{- C n t^2} \tag{1}$$

for every $t>0$ and some $C>0$. Such random variables are surprisingly common. For example when $X_n$ is the empirical mean

$$ X_n := \frac{1}{n} \sum_{i=1}^n Y_i$$

of nice random variables $Y_i$ (e.g., iid and bounded), various concentration inequalities imply that $X_n$ satisfies (1). A standard argument (see p. 10 here) bounds the $p$th moments for such random variables:

$$ \mathbb{E}[|X_n-\mu|^p] \le \left(\frac{p}{2 C n}\right)^{p/2}.$$

Therefore, for any "sufficiently nice" analytic function $f$ (see below), we can bound the error $\mathcal{E}_m$ on the $m$-term Taylor series approximation using the triangle inequality

$$ \mathcal{E}_m:=\left|\mathbb{E}[f(X_n)] - \sum_{p=0}^m \frac{f^{(p)}(\mu)}{p!} \mathbb{E}(X_n-\mu)^p\right|\le \tfrac{1}{(2 C n)^{(m+1)/2}} \sum_{p=m+1}^\infty |f^{(p)}(\mu)| \frac{p^{p/2}}{p!}$$

when $n>C/2$. Since Stirling's approximation gives $p! \approx p^{p-1/2}$, the error of the truncated Taylor series satisfies

$$ \mathcal{E}_m = O(n^{-(m+1)/2}) \text{ as } n\to \infty\quad \text{whenever} \quad \sum_{p=0}^\infty p^{(1-p)/2 }|f^{(p)}(\mu)| < \infty \tag{2}.$$

Hence, when $X_n$ is strongly concentrated and $f$ is sufficiently nice, the Taylor series approximation is indeed accurate. The inequality appearing in (2) implies that $f^{(p)}(\mu)/p! = O(p^{-p/2})$, so that in particular our condition requires that $f$ is entire. This makes sense because (1) does not impose any boundedness assumptions on $X_n$.

Let's see what can go wrong when $f$ is has a singularity (following whuber's comment). Suppose that we choose $f(x)=1/x$. If we take $X_n$ from the $\mathrm{Normal}(1,1/n)$ distribution truncated between zero and two, then $X_n$ is sufficiently concentrated but $\mathbb{E}[f(X_n)] = \infty$ for every $n$. In other words, we have a highly concentrated, bounded random variable, and still the Taylor series method fails when the function has just one singularity.

A few words on rigor. I find it nicer to present the condition appearing in (2) as derived rather than a deus ex machina that's required in a rigorous theorem/proof format. In order to make the argument completely rigorous, first note that the right-hand side in (2) implies that

$$\mathbb{E}[|f(X_n)|] \le \sum_{i=0}^\infty \frac{|f^{(p)}(\mu)|}{p!} \mathbb{E}[|X_n-\mu|^p]< \infty$$

by the growth rate of subgaussian moments from above. Thus, Fubini's theorem provides

$$ \mathbb{E}[f(X_n)] = \sum_{i=0}^\infty \frac{f^{(p)}(\mu)}{p!} \mathbb{E}[(X_n-\mu)^p]$$

The rest of the proof proceeds as above.