There's not been any significant response to my questions, but I think I've figured them out. For anyone who's interested:
Let $J$ map ordinary continuous functions $f$, definable pointwise, to real numbers $J[f]$, which is my understanding of a functional $J$. Then $J$'s functional derivative is some generalized function (a.k.a. distribution), usually denoted $\frac {δJ[f]}{δf}$ in the physics textbooks. Let $\langle\Phi,\eta\rangle$ denote the action of a generalized function $\Phi$ on a test function $\eta$. By definition, for each test function $\eta$ one must have
$$\langle \frac {δJ[f]}{δf},\eta\rangle = \lim_{\epsilon \to 0} \frac {J[f + \epsilon \eta] - J[f]}{\epsilon}.$$
That $J$'s derivative in general must be a generalized function rather than a pointwise-defined function is easily seen from the following simple example: For continuous functions $f$, let $J[f] = f(a)$, where $a$ is some fixed number. Then for each test function $\eta$,
$$\langle \frac {δJ[f]}{δf},\eta\rangle = \lim_{\epsilon \to 0} \frac {J[f + \epsilon \eta] - J[f]}{\epsilon} = \lim_{\epsilon \to 0} \frac {[f(a) + \epsilon \eta(a)] - f(a)}{\epsilon} = \eta (a) = \langle{\delta}_a,\eta\rangle,$$
so $\frac {δJ[f]}{δf} = {\delta}_a$ = "the Dirac delta function at $a$". As is well, known, Dirac delta functions are generalized functions incapable of representation as pointwise-defined functions. So functional derivatives cannot be limited to ordinary functions and must instead include the generalized functions.
If a functional derivative $\frac {δJ[f]}{δf}$ is representable as an ordinary function, then its value at $x$ is denoted $\frac {δJ[f]}{δf(x)}$. The action of the functional derivative on a test function may then be written as an integral
$$ \langle \frac {δJ[f]}{δf},\eta\rangle = \int_{-\infty}^{+\infty} \frac {δJ[f]}{δf(\xi)} \eta (\xi) d\xi = \lim_{\epsilon \to 0} \frac {J[f + \epsilon \eta] - J[f]}{\epsilon}. $$
As the test functions $\eta$ may be chosen arbitrarily, this constitutes an implicit definition of $\frac {δJ[f]}{δf(x)}$.
Many physics texts substitute the Dirac delta function ${\delta}_x$ in for the test function $\eta$ and claim to thereby obtain an explicit equation
$$(*)\qquad \frac {δJ[f]}{δf(x)} = \int_{-\infty}^{+\infty} \frac {δJ[f]}{δf(\xi)} {\delta}_x (\xi) d\xi = \lim_{\epsilon \to 0} \frac {J[f + {\epsilon \delta}_x] - J[f]}{\epsilon}. $$
for $\frac {δJ[f]}{δf(x)}$. Such a procedure is not legitimate, for test functions must be $C^{\infty}$ ordinary functions, which ${\delta}_x$ is not. Additionally, it doesn't make much sense to vary a continuous function $f$ only at a single point $x$. However one can vary $f$ in an arbitrarily small neighborhood of $x$, and doing so is the key to interpreting $(*)$. Instead of the substitution $\eta = {\delta}_x$, substitute in a sequence of functions $\eta_{x,k}$ which converge (in the sense of generalized functions) to ${\delta}_x$, and then take the limit $\epsilon \to \infty$. (If one wants to be definite, take for the sequence a sequence of Gaussians centered on $x$ with standard deviations shrinking to 0 as $k \to \infty$.) One obtains
$$(**)\qquad \frac {δJ[f]}{δf(x)} = \lim_{k \to \infty} \int_{-\infty}^{+\infty} \frac {δJ[f]}{δf(\xi)} {\eta}_{x,k}(\xi)d\xi = \lim_{k \to \infty} \lim_{\epsilon \to 0} \frac {J[f + \epsilon {\eta}_{x,k}] - J[f]}{\epsilon}. $$ This expression suggests three rules which make $(*)$ computationally useful in deducing the functional derivative at $x$ for specific functionals. The first rule reflects the fact that in $(**)$, before either limiting process is carried out, the test function ${\eta}_{x,k}$ is a perfectly well-behaved ordinary function. In $(**)$, the process $\lim_{k \to \infty}$, which effectively transforms ${\eta}_{x,k}$ into ${\delta}_x$, is carried out after the $\lim_{\epsilon \to 0}$ process; this is the source of the second rule. Eq. $(**)$'s last operation is $\lim_{k \to \infty}$; since this effectively transforms ${\eta}_{x,k}$ into ${\delta}_x$, the third rule makes employment of ${\delta}_x$'s special properties the last thing one does.
Rule 1. When computing the right hand side of $(*)$, first compute the difference quotient as if ${\delta}_x$ were a test function. Do not yet invoke properties peculiar to delta functions, such as the sifting property.
Rule 2. After having calculations of the difference quotient as far as permitted under Rule 1, take the limit $\epsilon \to 0$. As one does so, continue to treat ${\delta}_x$ as if it were a test function. In particular, terms such as $\epsilon {\delta}_x$, $\epsilon {\delta}_x^2$, etc. vanish as $\epsilon \to 0$.
Rule 3. Lastly, after having taken calculations as far as permitted by Rules 1 and 2 when computing the right hand side of the physicists' equation, use the properties peculiar to ${\delta}_x$ as a generalized function, such as the sifting property or the integration by parts property $\langle{\Phi}',\eta\rangle = -\langle{\Phi},{\eta}'\rangle$.
These rules for making $(*)$ useful were inspired by some remarks by Walter Greiner in his Field Quantization (Springer-Verlag 1996), pp. 36-39. Those remarks were vague and without any obvious or explicit justification. I hit upon the above explanation in trying to make sense of them. So far $(*)$, when guided by the three rules, has worked quite well as a tool for computing the derivatives of specific functionals.
When computing the derivative of a nonlinear functional, use of $(*)$ without guidance from the three rules almost always turns into a disaster, as crazy things like $\epsilon \delta(0)$ and ${\delta}_x^2$ occur in one's calculations.
You are simply reading it backwards. When they say "impulse response of an inhomogeneous linear differential equation defined on a domain, with specified initial conditions or boundary conditions" they mean the function $G(x,s)$ such that $L(G(x,s)) = \delta(x-s)$.
I know it is a little counterintuitive at first since one thinks on the differential operator $L$ as something that you feed with a function and gives back a function. But in this case the operator we are dealing with is in fact the solution operator of the differential equation. That is $S:X\rightarrow Y$, where $X$, $Y$ are function spaces such that $S(f) = u$ when
\begin{equation}
\begin{cases}
Lu=f &\text{ in } \Omega\\
u = 0 &\text{ in } \partial\Omega
\end{cases}
\text{ (It can also be a different boundary condition depending on the problem)}
\end{equation}
Think of the Green functions and the $\delta$ in the following way to notice why this is useful, the $\delta$ is "kind of a base of the functions spaces" since you can "write" any function as
\begin{gather}
f(x)"=" \sum_s f(s)\delta(x-s)\\ \text{ (It really is an integral not a sum, in fact is a convolution integral)}
\end{gather}
And, since the solution operator is linear, then to solve the problem for a general $f$ it is enough to solve it for the deltas and then you just sum using superposition. The Green functions are just the solutions of the deltas, that is
\begin{equation}
G(x,s) = S(\delta(x-s))\\
\end{equation}
so
\begin{equation}
u(x) = S(f)(x) "=" \sum_s f(s)S(\delta(x-s)) = \sum_s f(s)G(x,s)
\end{equation}
Notice that $s$ is a parameter not the variable of the delta function so $f(s)$ is a constant for the solution operator. Notice also that this is not a really formal answer, but I hope it is useful to star understanding what the Green functions are about.
Best Answer
Remark
Given a differential operator $L$ we seek the solution $u$ to the inhomogenous ODE
$$L[u(t)] = f(t)$$
The green function of $L$ is $G(x,t)$ satisfying
$$L[G(x,t)] = \delta(x-t)$$
such that our solution $u(t)$ can be expressed as an integral with kernel $G$:
$$u(t) = \int_C{G(x,t)f(x)dx}$$
Over a suitable domain C.
A fantastic book on such methods of solving ODEs is Brian Davies - Integral Transforms and Their Applications, This is a specialised book and only discusses this topic, but i'm sure you have the prerequisites to follow. Riley Hobbs - Mathematical Methods for physicists is also on I always go back to. It covers a lot so it is expensive and heavy!