To simplify, let's discuss first what happens near $x_0 = 0$.
The way the local linear approximation works is that we are looking for the line that "best approximates" the function near $0$. This means that we want a function $y=a+bx$ with the property that the error between $y=a+bx$ and $y=f(x)$ goes to $0$ faster than $x$ approaches $0$. The first That is, we want
$$\lim_{x\to 0}\frac{f(x) - (a+bx)}{x} = 0.$$
A bit of work shows that the only $a$ and $b$ that can work are $a=f(0)$ and $b=f'(0)$:
$$\begin{align*}
\lim_{x\to 0}\frac{f(x)-(a+bx)}{x} &= \lim_{x\to 0}\frac{f(x)-f(0)+f(0)-(a+bx)}{x}\\
&= \lim_{x\to 0}\frac{f(x)-f(0)}{x-0} + \lim_{x\to 0}\frac{f(0)-(a+bx)}{x}\\
&= \lim_{x\to 0}\frac{f(x)-f(0)}{x-0} + \lim_{x\to 0}\frac{f(0)-a}{x} - \lim_{x\to 0}\frac{bx}{x}.
\end{align*}$$
If the function is differentiable, then the first limit is $f'(0)$; the last limit is $b$; and if we want the limit in the middle to exist, we need $f(0)=a$. So the only way this limit is equal to $0$ is if $b=f'(0)$ and $a=f(0)$.
This tells us that the line of "best approximation" to $f(x)$ near zero is $y = f(0) + f'(0)x$, which is precisely the local linear approximation.
(The fraction $\frac{f(x)-(a+bx)}{x}$ is the "relative error", because the numerator measures the error, but by dividing it by $x$ we are trying to take into account how large the quantity is; if I tell you that I measured a distance and I was off by 2 miles, you don't know if it was a very good approximation or a very bad one; it would be very good if I were measuring the distance to the moon; it would be very bad if I was measuring the length of my desk; the denominator tries to "normalize" the measurement so that it tells us how big the error is relative to how big the thing we are measuring is)
Now, the advantage of the local linear approximation is that it is very simple; the disadvantage is that it can get bad pretty quickly; for example, the local linear approximation of $y=\cos x$ at $0$ is $y=1$, which gets "bad" very quickly.
So we may want to approximate with something else which, while still easy, has a better chance of being a good approximation.
One possibility is to go from linear to quadratic: let's try to approximate $y=f(x)$ with a quadratic function, $y = a+bx+cx^2$; again, in order for the approximation to be very good, we want the error to go to zero faster than $x$ goes to $0$. And since now we have a parabola, we can require that the curvature of the parabola at $0$ be the same as the curvature of $y=f(x)$ at $0$ (this is what messes us up with $y=\cos x$: it has big curvature, but the local linear approximation is flat).
Now, for this parabola to approximate $y=f(x)$ well near $0$, we are going to want their local linear approximations to be the same (if they have different local linear approximations, then we can't expect $y=a+bx+cx^2$ to be "close to" $f(x)$). The local linear approximation to $y=a+bx+cx^2$ at $x=0$ is give by $a+bx$, so we want $a+bx = f'(0)x+f(0)$; so $a=f(0)$, and $b=f'(0)$.
To get the same curvature, we want $f''(0) = y''(0)$. Since $y''(0) = 2c$, we want $c = \frac{1}{2}f''(0)$. So our $y$ will be
$$y = f(0) + f'(0)x + \frac{f''(0)}{2}x^2.$$
In fact, this is indeed a very good approximation: it has the same value as $f(x)$ at $x=0$; the relative error goes to zero, since
$$\begin{align*}
\lim_{x\to 0}\frac{f(x) - (f(0)+f'(0)x + \frac{1}{2}f''(0)x^2)}{x} &=
\lim_{x\to 0}\frac{f(x) - (f(0)+f'(0)x)}{x} - \lim_{x\to 0}\frac{f''(0)x^2}{2x}\\
&= 0 - \lim_{x\to 0}\frac{1}{2}f''(0)x = 0.
\end{align*}$$
But more than that: the relative error between the derivatives goes to $0$:
$$\begin{align*}
\lim_{x\to 0}\frac{f'(x) - (f'(0)-f''(0)x)}{x} &= \lim_{x\to 0}\frac{f'(x)-f'(0)}{x}- f''(0)\\
&= f''(0)-f''(0) = 0.
\end{align*}$$
So not only does $y=f(0) + f'(0)x + \frac{1}{2}f''(0)x^2$ have the same value, and have the best possible relative error, the derivative is the best possible approximation to the derivative of $f(x)$; so the graph will have very similar shape near $0$.
If we then go on to a degree $3$ approximation, $y=a+bx+cx^2+dx^3$; if we want the relative error to go $0$, we are going to need $a=f(0)$, $b=f'(0)$. If we want the relative error in the derivative to go to $0$, we are going to need $c=\frac{1}{2}f''(0)$ as before. What if we want the relative error of the second derivative to go to $0$ as well? The second derivative of $y=a+bx+cx^2+dx^3$ is $2c+6dx = f''(0) + 6dx$. So we have:
$$\begin{align*}
\lim_{x\to 0}\frac{f''(x) - (f''(0)+6dx)}{x} &= \lim_{x\to 0}\frac{f''(x)-f''(0)}{x} - \lim_{x\to 0}6d\\
&= f'''(0) - 6d.
\end{align*}$$
For this to be equal to $0$, we need $6d = f'''(0)$, or $d = \frac{1}{6}f'''(0)$.
If we then go to a degree $4$ approximation and ask that the relative error, the relative error of the derivatives, the relative error of the second derivatives, and the relative errors of the third derivatives all go to $0$, then we find that we need the function $y= f(0) +f'(0)x + \frac{1}{2}f''(0)x^2 + \frac{1}{6}f'''(0)x^3 + ex^4$, where
$f^{(4)}(0) = 12e$, leading to $e = \frac{1}{12}f^{(4)}(0)$.
It is not hard to then see, inductively, that the coefficient we get for $x^n$ if we proceed this way will always be $\frac{f^{(n)}(0)}{n!}$.
You can repeat the same idea around any point $x_0$, but if you do it directly it turns out that you cannot easily use the coefficients you found for, say, the linear approximation, in the quadratic approximation; you end up with a system of equations instead of simply repeating the old coefficients. The simple way of "fixing" this is to shift everything to $0$, solve the problem at $0$, and then shift it back.
So if you want to solve the problem at $x_0$, we consider instead $g(x) = f(x+x_0)$, because then approximating $f$ near $x_0$ is the same as approximating $g$ at $0$. Moreover, $g'(x) = f'(x+x_0)(x+x_0)' = f'(x+x_0)$, $g''(x) = f''(x+x_0)$, etc. So, from the work above, we see that the degree $n$ approximation to $g$ near $0$ that has best possible relative error, best possible relative error between derivatives, best possible relative error between 2nd derivatives, etc. is
$$g(0) + g'(0)x + \frac{1}{2}g''(0)x^2 + \cdots + \frac{1}{n!}g^{(n)}(0)x^n.$$
But form the above, this is the same as
$$f(x_0) + f'(x_0)x + \frac{1}{2}f'(x_0)x^2 +\cdots + \frac{1}{n!}f^{(n)}(x_0)x^n.$$
This is the local approximation to $f(x+x_0)$ near $x=0$. Replacing $x$ with $x-x_0$, we get $f(x-x_0+x_0) = f(x)$, the "old $x$" was near $0$, so the new $x$ needs to be near $x_0$, and we get that for $x$ near $x_0$, we have
$$f(x)\approx f(x_0) + f'(x_0)(x-x_0) + \frac{1}{2}f'(x_0)(x-x_0)^2 + \cdots + \frac{1}{n!}f^{(n)}(x_0)(x-x_0)^n$$
exactly the formula for the $n$th Taylor polynomial approximation to $f(x)$ near $x_0$.
Sam L. raises the fair question of why we would want to assume that our approximations are polynomials. If you are going via the "make the relative error go to $0$; make the relative error of the relative error go to $0$", etc., you will discover that you are naturally led to polynomials.
Here's another motivation: when we encounter a function which we want to integrate and for which we are unable to find an antiderivative, we end up with two possible approaches:
Attempt to approximate the value of the integral via Riemann sums; in essence, the integral is a limit, so we can approximate the limit by using terms of the sequence whose limit we are trying to compute. This leads to left, right, midpoint, trapezoid, Simpson approximations, and other methods of numerical approximations to the integral using the function $f$.
Find a function $\mathcal{F}$ which approximates $f$ but is easy to integrate. (In fact, this is in part what is behind the Simpson rule approximation, in that it approximates the function $f$ in each subinterval by a quadratic function). Since polynomials are generally easy to integrate, trying to find polynomials that are good approximations to $f$ seems like a good idea; that is, the class of polynomials are a good proving ground to try to find "good approximations to $f$", because they are easy to integrate.
For one variable you can write Taylor series around $x=a$ as below
$$f(a+\epsilon)-f(a)=\epsilon \frac{df(x=a)}{dx}+\frac{\epsilon^2}{2}\frac{d^2f(x=a)}{dx^2}+O(\epsilon^3)$$
For functional case the same can be written as
$$J(y)=\int_a^bf(x,y,y')dx$$
and for weak variations it follows that
$$\hat y=y+\epsilon t\Rightarrow J(\hat y)=J(y+\epsilon t)=\int_a^bf(x,y+\epsilon t,y'+\epsilon t')dx$$
If you want to have stationary $y$ below equation must be zero, since we don't want any variations for integral value around $y$
$$J(y+\epsilon t)-J(y)=\int_a^b\bigg(f(x,y+\epsilon t,y'+\epsilon t')-f(x,y,y')\bigg)dx$$
The integral is zero independent of $dx$ if below condition is satisfied for $[a,b]$
$$f(x,y+\epsilon t,y'+\epsilon t')-f(x,y,y')=0$$
If you compare it with the Taylor expansion formula you can see that they have same formulation ifyo replace $\epsilon$ by $\epsilon t$ and it follows that
$$f(x,y+\epsilon t,y'+\epsilon t')-f(x,y,y')=\epsilon t \frac{\partial f(x,y,y')}{\partial y}+\epsilon t' \frac{\partial f(x,y,y')}{\partial y'}+O(\epsilon^2)=0$$
Edit: They treat the functions as independent inside $f$. Consider the optimization case with constraints. For objective function you treat every variable as independent but with constraint equations you impose some relations between independent variables. It is the same here: the variables are treated as independent but the condition is imposed by $\hat y=y+\epsilon t$ and $\hat y'=y'+\epsilon t'$
Best Answer
Taylor's theorem says that (of course, this is not the most general version of the theorem)
The precise meaning of the $\mathcal{O}$ notation (I know this isn't what you asked, but bear with me) is that the remainder function $\rho_{n,x_0}: I \to \Bbb{R}$, defined by \begin{align} \rho_{n,x_0}(x):= f(x) - \left[f(x_0) + f'(x_0)(x-x_0) + \dots + \dfrac{f^{(n)}(x_0)}{n!}(x-x_0)^n\right] \end{align} satisfies the following condition (this condition gives a quantitative meaning to "the remainder is small")
Note that in all this business, things like $x$ and $x_0$ should be thought of as numbers. Honest to god numbers. So, $f(x)$ is a number! It is no longer a function anymore. $f'(x_0)$ is a number. Something like $f'''(\ddot{\smile})$ is also another number. The reason I keep saying "for all $x \in I$" is that I'm explicitly telling you that for any real number I pick, if that real number lies in the domain, $I$, of the function $f$, then the equations above are true. For example, suppose I take $x_0 = 0$, and suppose that the domain of $f$ is $I = \Bbb{R}$, the whole real line. Then,
And so on. Literally any real number $x$ you think of, as long as the number $x$ lies inside the domain of the function $f$, you can plug it into the above equations and they remains true.
It may seem silly to spend so much time on these simple cases, but that's exactly what we need to do to understand the fundamentals. Now, suppose I have two functions in the game, $f:I_f \to \Bbb{R}$ and $g:I_g \to I_f$, where $I_f, I_g \subset \Bbb{R}$ are intervals in the real line. Now, let's pick a number $x_0 \in I_f$, to "Taylor-expand the function $f$ about". Well, now let's pick ANY number $t \in I_g$. Then, $g(t)$ is a specific real number, which lies inside $I_f$ (the domain of $f$). Now, since $g(t)$ is a real number lying inside the domain of $f$, by Taylor's theorem, I can clearly say: \begin{align} \begin{cases} f(g(t)) &= f(x_0) + f'(x_0)(g(t) - x_0) + \dots + \dfrac{f^{(n)}(x_0)}{n!}(g(t) - x_0)^n + \rho_{n,x_0}(g(t)) \\ |\rho_{n,x_0}(g(t))| & \leq B_n|g(t) - x_0|^{n+1} \end{cases} \end{align}
Here's something to take note of: I'm not saying anything like "f is a function of $x$ or $g$ is a function of $t$" or anything like that, because really such statements are meaningless in this context. All I care about is functions, their domains, and numbers. That's it.
Never EVER get hung up on what letters we use. Math does NOT care what you favourite letter is (forgive the caps... don't think of this as shouting... I really just want to emphasize an obvious fact, which sometimes people seem to forget; I know I sure forget this from time to time). So, don't pay much attention to the fact that I used the letter $t$ instead of $x$. If you want, I can say the following statement, and it says literally the same thing as what I said above:
Just to emphasize once again that symbols shouldn't change the intended meaning, note that the following statement is just as mathematically valid:
One more time just for the sake of fun:
In each of these statements, $t, x, \ddot{\smile}, \#$ were all just names/symbols I gave to specific numbers in the domain $I_g$. Therefore, $g(t), g(x), g(\ddot{\smile}), g(\#)$ are all specific real numbers which lie in $I_f$, which happens to be the domain of $f$.
So, if you're ever in doubt if you can plug something into a function, just ask yourself one very simple question: is the thing I'm about the plug in part of the domain of validity of my function? If the answer is "yes", then of course, you're allowed to plug it in, otherwise, you can't (simply by definition of "domain of a function").
By the way, I know I haven't directly addressed your question about the multipole expansion. The reason is because your problem seemed to be more of a conceptual one understanding the meaning of what one means by substitution (lol I remember being confused by these matters too). Given what I've written so far, I invite you to read through the multipole argument again, and try to convince yourself that the manipulations are all valid. If you still have trouble, then let me know.
Edit: Responding to OP's comments.
The bounding condition on the $n+1$th derivative has nothing to do really with pluging in a number like $g(t)$, because like I mentioned in my first sentence, the theorem stated above is not the most general version. Here's is the version of Taylor's theorem which I first learnt, and which has the weakest hypotheses:
The precise meaning of the little-$o$ notation here is as follows: we first define the "remainder function" $\rho_{n,x_0}: I \to \Bbb{R}$ as before: \begin{align} \rho_{n,x_0}(x):= f(x) - \left[f(x_0) + f'(x_0)(x-x_0) + \dots + \dfrac{f^{(n)}(x_0)}{n!}(x-x_0)^n\right] \end{align} Then, the claim is that \begin{align} \lim_{x \to x_0} \dfrac{\rho_{n,x_0}(x)}{(x-x_0)^n} &= 0. \end{align}
Now, for the sake of notation let me introduce $T_{n,f,x_0}:I \to \Bbb{R}$ to mean the Taylor polynomial of $f$ of of order $n$, based at the point $x_0$. So, we have by definition that $f = T_{n,f,x_0} + \rho_{n,f,x_0}$ (because $\rho_{n,f,x_0}$ is literally defined as $f- T_{n,f,x_0}$).
Notice the differences between this version of the theorem and the previous version:
So, you're right, the $B_n$ is somehow related to the $(n+1)^{th}$ derivative. This form of the bound on the remainder is clearly very good, because if you have a specific function, you can try to estimate an upperbound for the derivative, then you get a really explicit bound on the remainder: $|\rho_{n,x_0}(x)| \leq B_n |x-x_0|^{n+1}$. It tells you literally that the remainder is always smaller than a certain $(n+1)$-order polynomial. And for example, if you take $x= x_0 + 0.1$, then $|\rho_{n,x_0}(x_0 + 0.1)| \leq B_n |0.1|^{n+1}$. If you take a number $x$ which is even closer to $x_0$, then clearly you can make the RHS extremely small, extremely "quickly", because of the power $n+1$.
Anyway, the reason I mentioned this form of Taylor's theorem is to say that regardless of the bound on the $n+1$ derivative, you can always plug in another function's values, $g(t)$, as long as the composition $f \circ g$ makes sense. That's the only restriction you have. More explicitly (with notation very similar to the one above),
This is trivially true, and you don't even need Taylor's theorem for this. Why? Because each equality I wrote above, $:=$ is true by definition (that's why I put the "$:$" infront of "$=$"). Why is it true by definition? Because I first define $T_{n,f,x_0}$ to be a certain function (namely the Taylor polynomial), and then I defined the remainder $\rho_{n,f,x_0}$ to be $f- T_{n,f,x_0}$, so of course it's trivially true that $f = T_{n,f,x_0} + \rho_{n,f,x_0}$. Said another way, all I did is add and subtract the same thing, it is as trival as saying something like $1 = (\pi^e) + (1-\pi^e)$. The non-trivial part is in saying that \begin{align} \lim_{x \to x_0}\dfrac{\rho_{n,f,x_0}(x)}{(x-x_0)^n} &= 0. \end{align} Suppose we have that $g(0) = x_0$. Then, what you should NOT do is make any false inferences like \begin{align} \lim_{t \to 0} \dfrac{\rho_{n,f,x_0}(g(t))}{t^n} &= 0 \end{align}
Anyway, the major conclusion here is that: As long as the composition $f \circ g$ makes sense, I can always write things like $f(g(t))$. And of course, once you think about this for a while, it becomes one of the most obvious things in the world.
Note that what I've been talking so far about is "Taylor's theorem" which deals with "Taylor polynomials", and NOT "Taylor series". A polynomial has a finite sum of terms, while a series is defined a limit of partial sums of finitely many terms. And this is probably more of what you're confused about in your comment.
One is very much tempted to write things like $T_{f,x_0} = \sum_{k=0}^{\infty}\dfrac{f^{(k)}(x_0)}{k!}(x-x_0)^k$, and call is the Taylor series of $f$ around $x_0$, and then say something like $f(x) = T_{f,x_0}(x)$, so that the function $f$ is equal to its Taylor series. But of course, before you can do this, you have to clarify a few things first:
Then, we define $C_{f,x_0} := \{x \in I_f| \, \, \lim_{n \to \infty}T_{n,f,x_0}(x) \text{ exists}\}$. i.e this is the set of points in the domain of $f$ for which the series converges ($C$ for convergence lol) to a (finite) number. Well, we know for sure that $x_0 \in C_f$, because we're simply taking the limit $\lim_{n \to \infty} T_{n,f,x_0}(x_0) = \lim_{n \to \infty}f(x_0) = f(x_0)$. i.e this limit exists. In standard analysis texts, one proves that $C_f$ is actually an interval; i.e if $x \in C_{f,x_0}$, then any number $\xi$ such that $|\xi- x_0| < |x-x_0|$ will also lie in $C_f$, i.e $\xi \in C_{f,x_0}$. This why we call $C_{f,x_0}$ the interval of convergence.
So, as a summary, to write something like $f(x) = T_{f,x_0}(x) = \sum_{k=0}^{\infty}T_{k,f,x_0}(x)$, one has to check two things:
It is only with these two conditions being satisfied that we can say that $f(x) = T_{f,x_0}(x)$.
An example:
Here's a very simple example. Let $I = \Bbb{R} \setminus\{1\}$, and define the function $f: I \to \Bbb{R}$ by \begin{align} f(x) &:= \dfrac{1}{1-x}. \end{align} Then, you can check that $f$ is infinitely differentiable at the origin, and that for every $k \geq 0$, $f^{(k)}(0) = k!$. So, the $n$-th Taylor polynomial for $f$ about the origin is \begin{align} T_{n,f, x_0 = 0}(x) &= \sum_{k=0}^{n} \dfrac{k!}{k!} x^k = \sum_{k=0}^n x^k = \dfrac{1-x^{n+1}}{1-x}. \end{align} Now, it is easy to see that the limit \begin{align} \lim_{n \to \infty} T_{n,f,x_0=0}(x) \end{align} exists if and only if $|x|< 1$ (if this isn;t clear, refer to any standard calculus/analysis text; this will be explained in more detail). Also, it is clear that for $|x|<1$, the limit as $n \to \infty$ is $\dfrac{1}{1-x}$. Thus, we have seen that
i.e it is only for $|x|<1$ that the the Taylor series of $f$ converges, AND actually equals $f$.
For example, let's now define $g: \Bbb{R} \to \Bbb{R}$ by $g(t):= t^2$. Here are a couple of statements we can make which hopefully illustrates the key points:
When can we write down $f(g(t))$? Well, by definition, we can do this if and only if $g(t) \in I_f = \Bbb{R} \setminus \{1\}$. i.e if and only if $g(t) = t^2 \neq 1$. i.e if and only if $t \notin \{-1, 1\}$. Repeating, for every $t \in \Bbb{R} \setminus \{-1,1\}$, we have that $g(t) \in I_f$, so \begin{align} f(g(t)) &= \dfrac{1}{1-g(t)} = \dfrac{1}{1-t^2} \end{align} (this shouldn't be surprising because it is pretty much a review of the definition of composition of functions).
Writing $f(g(1))$ is nonsense, because $g(1) = 1$ is not in the domain of $f$, so it is literally nonsense.
For every $t \in \Bbb{R} \setminus \{-1,1\}$, and every $n \geq 0$, we have that \begin{align} f(g(t)) &= T_{n,f,x_0=0}(g(t)) + \rho_{n,f,x_0=0}(g(t))\\ f(t^2) &= T_{n,f,x_0=0}(t^2) + \rho_{n,f,x_0=0}(t^2) \\ &= \sum_{k=0}^n t^{2k} + \rho_{n,f,x_0=0}(t^2) \end{align} Again, this is simply true by definition of how the remainder $\rho_{n,f,x_0=0}$ is defined (think back to the trivially true equation $1 = (\pi^e) + (1-\pi^e)$). The non-trivial statement (which is exactly the statement made in Taylor's theorem) is that \begin{align} \lim_{x \to 0}\dfrac{\rho_{n,f,x_0}(x)}{x^n} = 0 \end{align}
Another true statement is the following: we have $|g(t)| < 1$ if and only if $|t| < 1$. So, for every real number $t$ such that $|t|<1$, we have \begin{align} \dfrac{1}{1-t^2} &= f(t^2)\\ &= T_{f,x_0=0}(t^2) \tag{since $|t|< 1 \implies |t^2| < 1$}\\ &= \sum_{k=0}^{\infty}(t^2)^k \\ &= \sum_{k=0}^{\infty}t^{2k}. \end{align} Again, at this point don't be confused by the symbols. Everything is a number. $t$ is a number such that $|t|<1$. So, $t^2$ is also a number such that $|t^2| < 1$. So, of course, I can plug it into the Taylor series (which I've shown converges and equals the function $f$ on the interval $(-1,1)$). Again, think of particular numbers. $|0.1|< 1$, so $0.1^2 = 0.01$ clearly satisfies $|0.01|<1$. So, \begin{align} \dfrac{1}{1-0.01} &= f(0.01)\\ &= T_{f,x_0=0}(0.01) \tag{since $|0.01|< 1$}\\ &= \sum_{k=0}^{\infty}(0.01)^k \end{align} When you think of everything as particular numbers (which is exactly how you should think of them anyway), it becomes extremely easy to convince yourself that these manipulations are true.
On a similar note, It is very important to remember that $f(x) = T_{f,x_0=0}(x)$ if and only if $|x| < 1$. This is inspite of the fact that the function $f$ is defined from $\Bbb{R} \setminus\{1\} \to \Bbb{R}$; because the thing is the series on the RHS only converges when $|x| < 1$ (and when this happens it also happens to equal the function $f$). For example, $f(2)$ clearly makes sense, because $2 \in \text{domain}(f) = \Bbb{R} \setminus\{1\}$; also $f(2) = \frac{1}{1-2} = -1$. However, writing something like $T_{f,x_0=0}(2)$ is complete nonsense, because the limit \begin{align} \lim_{n \to \infty}T_{n,f,x_0=0}(2) = \lim_{n \to \infty} \sum_{k=0}^n 2^k = \infty \end{align} is not a (finite) number. i.e the limit doesn't exist in $\Bbb{R}$.
Hopefully these remarks show you what statements you can and can't make in regards to substituting things inside functions. As a summary:
When can I substitute one function's values inside another, like $f(g(t))$? Answer: whenever $t\in \text{domain}(g)$ and $g(t) \in \text{domain}(f)$. (this is literally definition of composition).
The equation $f(x) = T_{n,f,x_0}(x) + \rho_{n,f,x_0}(x)$ is true for every number $x \in \text{domain}(f)$, simply because I defined the terms on the RHS such that this equation is true. (think of this as the $1 = (\pi^e) + (1-\pi^e)$ business).
A completely different question is asking where the Taylor series of a function $f$ converges, and does it equal the function $f$? To answer this question, refer to my discussion above.