Derivation of the self-adjoint method for solving neural ODE

derivativesintegrationordinary differential equationspartial derivative

Let $z\in R^N$ be the vector in the ODE: $\frac{d z}{d t}=f(z,\theta,t)$. $\theta$ is the parameter for the neural network and $L:R^n\rightarrow R$ is the loss function. More details for the problem setup can be found here: https://random-walks.org/content/misc/node/node.html.

In the self-adjoint method, the adjoint is defined as $a(t)=\frac{\partial L}{\partial z(t)}$. And then, we have a very important result that is the foundation of the adjoint method $\frac{d a}{d t} = – \frac{\partial f}{\partial z}a$. I do not know how this is derived.

In the same post (https://random-walks.org/content/misc/node/node.html), it provides the proof for why $\frac{d a}{d t} = – \frac{\partial f}{\partial z}a$. One key step is to show $\frac{\partial }{\partial z} \int_t^{t+\epsilon} f(z,s) ds = \epsilon \frac{\partial f}{\partial z} + O(\epsilon^2)$. But there is no detailed explanation. I need help to derive $\frac{\partial }{\partial z} \int_t^{t+\epsilon} f(z,s) ds = \epsilon \frac{\partial f}{\partial z} + O(\epsilon^2)$.

Best Answer

You can derive it by making a $0$th order Taylor approximation of $f$ around $s = t$, namely \begin{align*} f(z,s) = f(z,t)+O(s) \end{align*} since then \begin{align*} \int_t^{t+\epsilon}f(z,s)ds = \int_t^{t+\epsilon}\left(f(z,t)+O(s)\right)ds = \epsilon f(z,t)+O(\epsilon^2) \end{align*}

Related Question