Method of Adjoints, Neural ODEs – Math Solves Everything

I've been trying to understand the gist behind the Chen et. al paper on neural ODE's (https://arxiv.org/pdf/1806.07366.pdf).
It seems like the main trick here is to be able to take derivatives of functions of an ODE solver, with respect to neural network parameters.

This is done by the adjoint sensitivity method, where we solve a differential equation in order to obtain the gradients of the loss function. To understand this technique, I wanted to implement a simple version.

$$
\frac{d z(t)}{dt} = f(z(t), t, \alpha) = \alpha z(t)
$$

With conditions: start time $t_0$, stop time $t_1$, initial position $z(t_0)$. Now based on our parameter and initial condition, we then have that our solved ode will give us $z(t) = e^{\alpha(t-t_0)}z(t_0)$.

Suppose I want to minimize the loss function: $$L = (z(t_1) -1)^2 /2$$. That is, I only care about the value of the solved ODE at time $t_1$, and i want its value to be 1.

I can do this analytically here, so I wanted to solve using the adjoint method, and confirm that the two methods match.

According to the adjoint method described in the paper, we then need to solve for the adjoint: $a(t) = \partial L/ \partial z(t)$. We do this by solving the differential equation which $a$ satisfies:
$$
\frac{d a}{d t} = -a \partial f/\partial z
$$
we can do this and obtain
$$
a(t) = e^{\alpha (t-t_1)} (z(t_1)-1)
$$
Which we can easily see matches our boundary condition of $a(t_1) = (z(t_1)-1)$

Now my goal was to find $d L / d \alpha$, which is given by equation (51) in the paper:
$$
\frac{d L}{d \alpha} = – \int_{t_1}^{t_0} a(t) \frac{\partial f}{\partial \alpha} = – \int_{t_1}^{t_0} e^{\alpha (t-t_1)} (z(t_1)-1) z(t) = – \int_{t_1}^{t_0} e^{\alpha (t-t_1)} (z(t_1)-1) e^{\alpha(t-t_0)}z(t_0)
$$
$$
= (z(t_1)-1)z(t_0) \sinh((t_1-t_0)\alpha)/\alpha
$$

However, we can determine $dL/d\alpha$ analytically here:
$$
\frac{dL}{d \alpha} = \frac{dL}{d z(t_1)}\frac{d z(t_1)}{d \alpha} = \big[(e^{\alpha(t_1-t_0)}z(t_0) -1)\big] \big[ (t_1-t_0)e^{\alpha(t_1-t_0)}z(t_0) \big]
$$

If I plug the form for $z(t_1)$ into the adjoint result, these two should match. But I get the following

Adjoint result: $(e^{\alpha(t_1-t_0)}z(t_0) -1)z(t_0) \sinh((t_1-t_0)\alpha)/\alpha$

Analytic result: $(e^{\alpha(t_1-t_0)}z(t_0) -1)z(t_0)(t_1-t_0)e^{\alpha(t_1-t_0)}$

But these do not match!

If somebody could explain why this is, I would really appreciate it.

Neural ODE's seem interesting, but if I can't understand an incredibly simple toy model, I dont see how I can use them.

Thanks for your time

Best Answer

I think you have made a mistake, without which everything seems to work fine:

$$\frac{da}{dt}=-a\frac{\partial f}{\partial z}=-\alpha a\Rightarrow a(t)=a(t_1)e^{-\alpha(t-t_1)}$$

The reason the minus sign is there is because of the time reversal of the equation (that is the essence of reverse mode backprop), so under the integral sign the two exponentials must have a different sign and that's how I noticed the mistake as well.

But these do not match!

Best Answer

Related Solutions

[Math] Newton’s method to solve implicit Runge-Kutta-method

Neural odes: the relationship between backpropagation and the adjoint sensitivity method

Related Question