For one variable you can write Taylor series around $x=a$ as below
$$f(a+\epsilon)-f(a)=\epsilon \frac{df(x=a)}{dx}+\frac{\epsilon^2}{2}\frac{d^2f(x=a)}{dx^2}+O(\epsilon^3)$$
For functional case the same can be written as
$$J(y)=\int_a^bf(x,y,y')dx$$
and for weak variations it follows that
$$\hat y=y+\epsilon t\Rightarrow J(\hat y)=J(y+\epsilon t)=\int_a^bf(x,y+\epsilon t,y'+\epsilon t')dx$$
If you want to have stationary $y$ below equation must be zero, since we don't want any variations for integral value around $y$
$$J(y+\epsilon t)-J(y)=\int_a^b\bigg(f(x,y+\epsilon t,y'+\epsilon t')-f(x,y,y')\bigg)dx$$
The integral is zero independent of $dx$ if below condition is satisfied for $[a,b]$
$$f(x,y+\epsilon t,y'+\epsilon t')-f(x,y,y')=0$$
If you compare it with the Taylor expansion formula you can see that they have same formulation ifyo replace $\epsilon$ by $\epsilon t$ and it follows that
$$f(x,y+\epsilon t,y'+\epsilon t')-f(x,y,y')=\epsilon t \frac{\partial f(x,y,y')}{\partial y}+\epsilon t' \frac{\partial f(x,y,y')}{\partial y'}+O(\epsilon^2)=0$$
Edit: They treat the functions as independent inside $f$. Consider the optimization case with constraints. For objective function you treat every variable as independent but with constraint equations you impose some relations between independent variables. It is the same here: the variables are treated as independent but the condition is imposed by $\hat y=y+\epsilon t$ and $\hat y'=y'+\epsilon t'$
I think it's a combination of a lot of things. There is always simply the possibility that some terms cancel out, or that the higher order derivatives don't exist for the function you are trying to approximate.
Many distributions only consider second order interactions between variables (quadratic), and thus derivatives higher than the second end up evaluating as 0 anyway. You're example of the Fisher information as the covariance of the asymptotic distribution of the MLE is a good example. Suppose you were looking at the asymptotic distribution of the mean in a multivariate normal with known precision: $X_i \overset{iid}{\sim} N(\mu,\Omega)$.
If you calculate the Fisher information matrix, you end up getting rid of the exponential through the log, and you're left with quadratic interactions as a function of $\Omega$:
$\mathcal{I}(\mu)_{rs} = \frac{\partial^2}{\partial_{\mu_r}\partial_{\mu_s}} \frac{1}{2}\sum_{r,s}\mu_r\mu_s\Omega_{rs}$.
So you can see that if you consider higher order derivatives you end up having terms which cancel out anyway. This isn't explicitly a Taylor series, but the idea that most models and distributions consider second order interactions gives intuition about why second order approximations come up so much. The fact that second order interactions are so easy to work with also motivates people stopping at a second order approximation, as it's usually sufficient and the problem can rapidly get more difficult to solve if one expands into a higher order approximation.
There's also situations in which clever choices of the argument and point about which you expand your Taylor series can make it such that higher order terms cancel out. You see this a lot with people expanding their function $f(x)$ about local maxima or minima $x^*$, and having the second term $(x f'(x^*))$ fall out. It's very possible that higher order derivatives do exist in some regions of the domain (unlike in the previous Fisher information example), but if you specify $x^*$ conveniently you don't have to worry about it. Other ways this comes up are in limits; if you're interested in studying function $f(x)$ at $f(x+\epsilon)$ with an expansion around $x = 0$, then you end up with:
$f(\epsilon) = f(0) + \epsilon f'(0) + \sum_{n = 2}^\infty \frac{\epsilon^n f^{(n)}(0)}{n!}$
And you can see that if your examining the limiting behavior, with $\epsilon$ very small, then $\epsilon^n$ rapidly becomes trivial, and those higher order terms can be mostly dismissed.
As other people have also said, people definitely do consider the accuracy of the approximation as well, whether that be deterministically or probabilistically. You can see that on the Wikipedia page for the Delta Method: https://en.wikipedia.org/wiki/Delta_method. They briefly go into the order of the approximation, and most advanced textbook do this in general when using Taylor series.
In general I think a lot of it has to do with what people are interested in the problem as well. There's obviously a lot of connections between Taylor series and moments, because it turns what can be a very complicated function into a polynomial where moments are easier to calculate. You can see this used for example in these two pages:
https://en.wikipedia.org/wiki/First-order_second-moment_method
https://en.wikipedia.org/wiki/Taylor_expansions_for_the_moments_of_functions_of_random_variables
So going back to most interesting investigations focusing on second order interactions, or asymptotic normality (where you only need to get the first two moments), it makes sense that people stop at the quadratic term.
Best Answer
The definition of Taylor polynomial: is the only polynomial of degree $n$ that coincides with the function and their derivatives up to the $n$-th: $$P_n(a)=f(a)$$ $$P_n'(a)=f'(a)$$ $$P_n''(a)=f''(a)$$ $$\cdots$$ $$P_n^{(n)}(a)=f^{(n)}(a)$$ Write $$P_n(x)=c_0+c_1(x-a)+\cdots c_n(x-a)^n,$$ and impose the $(n+1)$ conditions to find the $c_k$.