Derivative of the exponential function (of matrix functions) by a strange integral and a function object which does not commute with its derivative

derivativesexponential functionmatricesmatrix exponentialmatrix-calculus

At the bottom of this section of an article, Wikipedia claims:

$$\frac{d}{dt}\exp(X(t))=\int_0^1\exp(\alpha X(t))\frac{dX(t)}{dt}\exp((1-\alpha)X(t))\,d\alpha$$

For any general $t$-dependent object $X$. I can't get access to the paper they referenced from. My attempts to find this are as follows:

When $X(t)$ commutes with $\frac{d}{dt}X(t)$ the result of that integral is trivially what would be expected from the chain rule. When $X$ does not commute however, there is a problem! I let $\Delta$ denote the object, be it a matrix (what I'm most interested in) or something else, representing the direction derivative of $X$.

$$\begin{align}\frac{d}{dt}\exp(X(t))&=\sum_{n=0}^\infty\frac{d}{dt}\frac{(X(t)^n)}{n!}=\sum_{n=0}^\infty\frac{1}{n!}\frac{d}{dt}(X(t)^n)\\&=\sum_{n=0}^\infty\frac{1}{n!}\lim_{h\to0}\frac{(X+h\Delta)^n-X^n}{h}\cdot\frac{dX(t)}{dt}
\\\lim_{h\to0}\frac{(X+h\Delta)^n-X^n}{h}&=\lim_{h\to0}\frac{h(\Delta X^{n-1}+X\Delta X^{n-2}+X^2\Delta X^{n-3}+\cdots)+o(h^2)}{h}
\\&=\Delta X^{n-1}+X\Delta X^{n-2}+\cdots
\\&=\sum_{\alpha=0}^{n-1}X^{\alpha}\Delta X^{n-(1+\alpha)}
\\\therefore\frac{d}{dt}\exp(X(t))&=\sum_{n=0}^\infty\frac{1}{n!}\cdot\sum_{\alpha=0}^{n-1}X^{\alpha}\Delta X^{n-(1+\alpha)}\cdot\frac{dX(t)}{dt}\end{align}$$

And I have no idea whether any of this is right, nor how to make my "$\Delta$" anything meaningful (I would assume it is $\frac{d(X(t))}{dt}$, but I don't know). I don't even know if it should be in the above calculations. I more importantly have no idea how to calculate the integral formula at the top of this question or turn my workings into an integral, and I've just been showing my failed attempts thus far.

I would greatly appreciate any references, hints or answers, as this seems like a very important (at the very least interesting) formula but not one that I will be able to solve with my own knowledge.

Many thanks.

Best Answer

I will write simply $X$ instead of $X(t)$, and let $Y=X'$ be the derivative. Then by the linearity of the product function $(X_1,X_2,\dots,X_n)\to X_1X_2\dots X_n$ in each of the components, we have explicitly: $$ \begin{aligned} I'=(X^0)'&=0\ ,\\ (X^1)'&=Y\ ,\\ (X^2)'&=YX+XY\ ,\\ (X^3)'&=YXX+XYX+XXY\ ,\\ (X^4)'&=YXXX+XXYX+XYXX+XXXY\ ,\\ \\[2mm] &\qquad\text{ and in general...} \\[2mm] (\ X^{n+1}\ )' &=YX^n+XYX^{n-1}+\dots+X^nY =\sum_{\substack{j,k\ge 0\\j+k=n}}X^jYX^k\ . \end{aligned} $$ Then we have, using $a$ instead of $\alpha$, and $b:=1-a$ to have an easy notation: $$ \begin{aligned} \int_0^1 e^{aX}\; Y\; e^{bX}\; da\ &\ = \int_0^1 \left(\sum_{j\ge 0}\frac 1{j!}a^jX^j\right) \; Y\; \left(\sum_{k\ge 0}\frac 1{k!}b^kX^k\right)\; da\\ &\ = \int_0^1 \sum_{n\ge 0} \sum_{\substack{j,k\ge 0\\j+k=n}} \frac 1{j!k!}a^jb^k \; X^j\; Y\; X^k \; da\\ &\ = \sum_{n\ge 0} \int_0^1 \sum_{\substack{j,k\ge 0\\j+k=n}} \frac 1{j!k!}a^jb^k \; X^j\; Y\; X^k \; da\\ &\ = \sum_{n\ge 0} \sum_{\substack{j,k\ge 0\\j+k=n}} \frac 1{j!k!} \; X^j\; Y\; X^k \underbrace{\left(\int_0^1 a^jb^k\; da\right)}_{=B(j+1,k+1)}\\ &\ = \sum_{n\ge 0} \sum_{\substack{j,k\ge 0\\j+k=n}} \frac 1{(n+1)!} \; X^j\; Y\; X^k \\ &\ = \sum_{n\ge 0} \frac 1{(n+1)!} \; \Big(\ X^{n+1}\ \Big)' \\ &\ = \left( \sum_{n\ge 0} \frac 1{(n+1)!} \; X^{n+1}\right)' \\ &\ =(e^X-I)' \\ &\ =(e^X)' \ . \end{aligned} $$ Above, $B$ is the Beta function, and for $j,k$ with $j+k=n$... $$ B(j+1,k+1)=\frac{\Gamma(j+1)\Gamma(k+1)}{\Gamma((j+1)+(k+1))} =\frac{j!\; k!}{(j+k+1)!}=\frac{j!\; k!}{(n+1)!}\ . $$ At many places we have exchanged two "limit-like" operation (like integration, derivation, building a series). At these places we have a "domination" in norm by the corresponding functions obtained by replacing $X,Y$ by the real numbers $\|X\|,\|Y\|\ge 0$,the so exchanging the operations reduce to exchanging them in the world of functions of a real variable $x=\|X\|\ge 0$, which is more or less standard. Versions of the Dominated Convergence Theorem (Lebesgue) can be applied directly on more general spaces, Banach spaces.


Later EDIT: The above solution focuses on the "algebraic" aspect of the computations. Here is an explanation for the "analytic" aspect, first of all for the need of being rigorous on this point, because this was a main concern in the OP. The analytic aspect is best seen in the following step, where we (write explicitly more lines to) exchange two "limiting" operators: $$ \begin{aligned} \left(\sum_{n\ge 0}\frac 1{n!}X^n\right)'(t) &= \lim_{h\to 0}\frac 1h\left[ \left(\sum_{n\ge 0}\frac 1{n!}X^n(t+h)\right) - \left(\sum_{n\ge 0}\frac 1{n!}X^n(t)\right) \right] \\ &= \color{blue}{\lim_{h\to 0}} \color{darkgreen}{\sum_{n\ge 0}} \frac 1{n!}\frac 1h \left(X^n(t+h) - X^n(t)\right) \\ &\overset{\color{red}{(?)}}= \color{darkgreen}{\sum_{n\ge 0}} \color{blue}{\lim_{h\to 0}} \frac 1{n!}\frac 1h \left(X^n(t+h) - X^n(t)\right) \\ &=\sum_{n\ge 0}\frac 1{n!}(X^n)'(t)\ . \end{aligned} $$ There is a point that has to be addressed at least above, namely the point where we exchange the order of the the two "limiting proceses" $ \color{blue}{\lim_{h\to 0}}$ and $ \color{darkgreen}{\sum_{n\ge 0}}$. (A simple counterexample is as follows. Consider the double sequence $(a_{kn})_{k,n\ge 0}$ displayed as an infinite matrix. It has a "diagonal", and now put under the diagonal only zeros, and over it only ones (or even some random numbers to have no limit below). Then building limits first on rows, then on columns delivers one answer, taking the limits in the other order - an other answer...)

How to obtain this exchange by using the Dominated convergence theorem in the form $\lim_h\int_S f_h=\int_S f$ where $f_h\to f$ and $\|f_h\|$ is dominated by one and the same function $g$?

Take the ("discrete") measure space $S$ to be the set of the natural numbers, where each $m$ gets the measure $1$. Now fix some $t$ and consider $f_h$ to be the function on $S$ given by $f_h(m)=\frac 1{m!}\cdot \frac 1h( X^m(t+h)-X^m(t))$. Seen as a function in $h$, we have continuity and the limit for $h\to 0$ exists, it is in the above notations $\frac 1{m!}(X^{m-1}Y+\dots +YX^{m-1})(t)$, so in norm it is estimated as follows: $$ \begin{aligned} \left\|\frac 1{m!}(X^{m-1}Y+\dots +YX^{m-1})(t)\right\| &\le \frac 1{m!}( \|X^{m-1}Y+\dots +YX^{m-1}\|)(t) \\ &\le \frac 1{m!}( \|X^{m-1}Y\|+\dots + \|YX^{m-1}\|)(t) \\ &\le \frac 1{m!}( \|X^{m-1}\|\;\|Y\|+\dots + \|Y\|\;\|X^{m-1}\|)(t) \\ &\le \frac 1{m!}\cdot m\;\|X\|^{m-1}(t)\;\|Y\|(t)\ . \end{aligned} $$ Then, under conditions like having a $\mathcal C^1$-class function $X$, tacitly considered in such cases, the above is bounded by $g:S\to\Bbb R_{\ge 0}$, where $g(m)= \frac 1{m!}\cdot m\;\|X\|^{m-1}\;\|Y\|$. Then for $h$ in a small compact neighbourhood of $0$ the whole family $f_h$ is (by (uniform) continuity) in norm bounded by $2g$. So the domination is realized by $2g$, which is in $L^1(S)$ because $\int_S 2g= 2\sum_{m\ge 1}\frac 1{m!}\cdot m\;\|X\|^{m-1}\;\|Y\| =2\|Y\|\;\exp\|X\|<\infty$.

The other exchange of limiting processes, $ \color{blue}{\int_0^1} \color{darkgreen}{\sum_{n\ge 0}} = \color{darkgreen}{\sum_{n\ge 0}} \color{blue}{\int_0^1} $ is covered in a similar way.