On the usage of derivative in operator theory

derivativesmathematical physicsoperator-theoryquantum mechanics

In quantum mechanics, we work with linear operators on Hilbert spaces $\mathscr{H}$.
Suppose I have two bounded ones, defined on the same space $A, B: \mathscr{H}\to\mathscr{H}$.

It seems to me there is an ambiguity on the way to deal with the derivative.

On one way, the operator $AB$ is usually interpreted as the composition $(A\circ B)f:=A(B(f))$ for every test function $f$ on $\mathscr{H}$.
If so, the derivative $D$ operator on $AB$ should act as follows
$$
D[AB]f = D[(A\circ B)f]= D[A(Bf)]D(B(f))
$$

On the other way, the following right examples treat $AB$ as if it were literally a product of operators instead of a composition. In other words, the preferred way to compute the derivative is the Leibniz rule
$$
D[AB]= AD[B]+BD[A]
$$

$1^{\rm{st}}$ example, by E. Pisanty:

The exponential of an operator $\hat A(t)$ does not obey the differential equation
$$ \frac{d}{ dt}e^{\hat A(t)} \stackrel{?}{=} \frac{d \hat{A}}{ dt} e^{\hat A(t)} $$
that one might naively hope to satisfy.
To see why this does not work, consider the series expansion of the exponential
\begin{align*}
\frac{d}{dt}e^{\hat A(t)}
& = \frac{d}{dt}\sum_{n=0}^\infty \frac{1}{n!} = \sum_{n=0}^\infty \frac{1}{n!} \frac{d}{dt} \hat A^n(t),
\end{align*}

When we apply the product rule, we get the individual derivatives of each of the operators in the product, at their place within the product
\begin{equation*}
\frac{d}{dt} \hat A^n(t)
=
\frac{d\hat A}{dt} \hat A^{n-1}(t)
+\hat A(t)\frac{d\hat A}{dt} \hat A^{n-2}(t)
+ \ \dots \
+\hat A^{n-2}(t)\frac{d\hat A}{dt} \hat A(t)
+\hat A^{n-1}(t)\frac{d\hat A}{dt}
\end{equation*}

This can simplify to just $n\frac{ d\hat A}{dt} \hat A^{n-1}(t)$, in which case $\frac{d}{dt}e^{\hat A(t)}
= \frac{d \hat A}{dt} \sum_{n=1}^\infty \frac{\hat A^{n-1}(t)}{(n-1)!} = \frac{d\hat A}{dt}e^{\hat{A}(t)}$
, but only under the condition that $\hat A(t)$ commute with its derivative
$$ \left[\frac{ d\hat A}{ dt} , \hat A(t)\right] \stackrel{?}{=} 0 $$

In this case, $A^n$ is seen as a product of $A$ with itself $n$ times, instead of $A \circ A \circ \dots \circ A$ $n$ times.

$2^{\rm{nd}}$ example, by Wikipedia:

The expectation value of an observable $A$, which is a Hermitian linear operator, for a given Schrödinger state $\vert\psi(t)\rangle$, is given by
${\displaystyle \langle A\rangle _{t}=\langle \psi (t)|A|\psi (t)\rangle .}$
In the Schrödinger picture, the state $\vert\psi(t)\rangle$ at time $t$ is related to the state $\vert\psi(0)\rangle$ at time $0$ by a unitary time-evolution operator $U(t)$: ${\displaystyle |\psi (t)\rangle =U(t)|\psi (0)\rangle .}$
In the Heisenberg picture, all state vectors are considered to remain constant at their initial values $\vert \psi(t)\rangle$, whereas operators evolve with time according to
${\displaystyle A(t):=U^{\dagger }(t)AU(t)\,.}$
The Schrödinger equation for the time-evolution operator is
$${\displaystyle {\frac {d}{dt}}U(t)=-{\frac {iH}{\hbar }}U(t)}$$
where $H$ is the Hamiltonian and $\hbar$ is the reduced Planck constant.
It now follows that
$${\displaystyle {\begin{aligned}{\frac {d}{dt}}A(t)&={\frac {i}{\hbar }}U^{\dagger }(t)HAU(t)+U^{\dagger }(t)\left({\frac {\partial A}{\partial t}}\right)U(t)+{\frac {i}{\hbar }}U^{\dagger }(t)A(-H)U(t)\\&={\frac {i}{\hbar }}U^{\dagger }(t)HU(t)U^{\dagger }(t)AU(t)+U^{\dagger }(t)\left({\frac {\partial A}{\partial t}}\right)U(t)-{\frac {i}{\hbar }}U^{\dagger }(t)AU(t)U^{\dagger }(t)HU(t)\\&={\frac {i}{\hbar }}\left(H(t)A(t)-A(t)H(t)\right)+U^{\dagger }(t)\left({\frac {\partial A}{\partial t}}\right)U(t),\end{aligned}}}$$
where differentiation was carried out according to the product rule.

I really don't understand

Best Answer

On top of what @Raskolnikov pointed out in their comment, I believe there is another problem here: you seem to be mixing the derivative of a Hilbert space operator $A:\mathcal H\to\mathcal H$ and the derivative of objects $f:\mathbb R\to\mathcal B(\mathcal H)$ which map into the space of (bounded linear) Hilbert space operators.

  • The chain rule $D[A\circ B](\psi)=D[A](B(\psi))\circ D[B](\psi)$ which you mentioned first refers to the Fréchet derivative of operators. Here, given any function $A:\mathcal H\to\mathcal H$ one looks for the best linear approximation at a given point $\psi\in\mathcal H$. More precisely a bounded linear operator $D_\psi[A]$ (or $D[A](\psi)$ as you wrote) is called the (Fréchet) derivative of $A$ at $\psi$ if $$ \lim_{\|h\|\to 0}\frac{\|A(\psi+h)-A(\psi)-D_\psi[A]h\|}{\|h\|}=0\,. $$ The reason this notion does not really pop up in quantum mechanics -- and thus not in the two examples you cited -- is that the best linear approximation of an operator that is already linear is the operator itself: given $B\in\mathcal B(\mathcal H)$ and any $\psi\in\mathcal H$ one finds $D_{\psi}[B]=B$ because $$ \frac{\|B(\psi+h)-B(\psi)-D_\psi[B]h\|}{\|h\|}=\frac{\|B\psi+Bh-B\psi-Bh\|}{\|h\|}=0 $$ for all non-zero $h$ already. With this it's also easy to verify the chain rule you mentioned: given $A,B\in\mathcal B(\mathcal H)$, $\psi\in\mathcal H$ one finds $$ A\circ B=D_\psi[A\circ B]=D_{B(\psi)}[A]\circ D_\psi[B]=A\circ B\,. $$
  • When considering dynamical systems (e.g., in quantum mechanics) a time parameter usually enters the picture. This parameter can then be taken as the input of a function, for example of a solution to a differential equation which describes the dynamics of some quantum system. As you may know if a system in an initial state $\rho_0$ is described by a Hamiltonian $H\in\mathcal B(\mathcal H)$ the solution $\rho(t)$ to the Liouville-von Neumann equation $$ \frac{d}{dt}\psi(t)=-\frac{i}\hbar \underbrace{[H,\rho(t)]}_{:=H\rho(t)-\rho(t)H}\qquad\text{with}\qquad \rho(0)=\rho_0 $$ is given by $\rho(t)=e^{-\frac{i}\hbar tH}\rho_0 e^{\frac{i}\hbar tH}$, ${}^\text{footnote 1}$. In particular this is a function from the real numbers into (a certain subset of) operators acting on $\mathcal H$. Going one level higher the time-evolution operator $U(t)=e^{-\frac{i}\hbar tH}$ which contains all the information of how $\rho_0$ evolves in time is itself the solution to the differential equation $$ \frac{d}{dt}U(t)=-\frac{i}\hbar HU(t) $$ with $U(0)=\operatorname{id}$ being the identity operator on $\mathcal H$. Either way it "makes sense" to talk about the derivative of the maps $\rho(t),U(t)$, etc. because -- while $\rho(t),U(t)$ for any $t$ are linear operators -- the map $t\mapsto \rho(t)$ is in general not linear. To connect this to the examples you gave in your question: both times one considers maps $:\mathbb R\to\mathcal B(\mathcal H)$ (e.g., $t\mapsto e^{\hat A(t)}$ and $t\mapsto U(t)$) and asks about their time-derivative. This is also why the Leibniz rule occurs here: given differentiable curves $A,B:\mathbb R\to\mathcal B(\mathcal H)$, when talking about the time-derivative of the product $AB$ -- or, equivalently, the composition $A\circ B$ -- what one really asks about is the derivative of the map $AB:\mathbb R\to\mathcal B(\mathcal H)$ defined via $(AB)(t):=A(t)\circ B(t)$. This derivative is given by the Leibniz rule $$ D_t[A\circ B]=D_t[A]\circ B+A\circ D_t[B] $$ and not the chain rule, because the input of $A$ relevant for the derivative is the time-parameter $t$ -- and not some output of the operator $B(t)$.

${}^\text{footnote 1}$: If $\rho_0$ is a pure state $|\psi_0\rangle\langle\psi_0|$, then this reduces to the Schrödinger equation with solution $\psi(t)=e^{-\frac{i}\hbar tH}\psi_0$.

Related Question