[Math] Gradient of matrix exponential function

exponential functionmatricesmultivariable-calculusoptimization

Grateful if somebody could help me with the following. I am trying to find the gradient of the next expression:

$$f(a_1, a_2, a_3, a_4)=\Vert R*y-x \Vert $$

where $y$ and $x$ are known 4×1 column vectors and R is a 4×4 orthogonal rotation matrix given as an exponential

$$R = \exp(a_1*b_1+a_2*b_2+a_3*b_3+a_4*b_4), $$

where $a_1, a_2, a_3, $and $a_4$ are scalars and $b_1, b_2, b_3$, and $b_4$ are 4×4 matrices.

I want to calculate the gradient of $f(\cdot)$ with respect to $a_1, a_2, a_3$, and $a_4$. Any hint is appreciated.

Cristian

Best Answer

@ cristian, the essential task is to calculate the derivative of the function $\phi:t\rightarrow e^{tA+B}$. If $AB=BA$, then it is straightforward; $\phi'(t)=Ae^{tA+B}$. Else, it is much more difficult. If $X$ is a square matrix, then let $ad(X):H\in M_n\rightarrow XH-HX$ and $f:X\rightarrow e^X$. Then

$Df_X:H\in M_n\rightarrow e^X\sum_{k=0}^{\infty}\dfrac{(-ad(X))^k}{(k+1)!}H$.

Finally $\phi'(t)=e^{tA+B}\sum_{k=0}^{\infty}\dfrac{(-ad(tA+B))^k}{(k+1)!}A$.

EDIT: The choice of this form of the derivative of $\exp$ is due to its simplicity ; yet, there are other forms

$Df_x(H)=\sum_{n=0}^{\infty}\sum_{m=0}^{\infty}\dfrac{X^mHX^n}{(m+n+1)!}=\int_0^1e^{sX}He^{(1-s)X}ds$ and then

$\phi'(t)=\sum_{n=0}^{\infty}\sum_{m=0}^{\infty}\dfrac{(tA+B)^mA(tA+B)^n}{(m+n+1)!}=\int_0^1e^{s(tA+B)}Ae^{(1-s)(tA+B)}ds$.

I think that you cannot obtain a simpler form for $\phi'(t)$. Now, let $g:a_1\rightarrow ||Ry-x||$ ; then $g'(a_1)=\dfrac{1}{||Ry-x||}(Ry-x)^T(\dfrac{\partial R}{\partial a_1}y-x) $ where $\dfrac{\partial R}{\partial a_1}$ can be easily deduced from $\phi'(t)$. You say that $R$ is orthogonal ; then are the $(b_i)$ skew-symmetric matrices ? If yes and if the $(b_i)$ are known numeric matrices, then you can explicitly calculate the $4$ eigenvalues ($\pm i\alpha,\pm i\beta$) of the skew-symmetric matrix $\sum_ia_ib_i$. If you have Maple, you can calculate $\dfrac{\partial R}{\partial a_1}(a_1,a_2,a_3,a_4)$ for numeric values of the $(a_i)$ (time of calculation with $20$ significant digits: 13").

About $ad$, the Lie derivative: $ad(X)=X\bigotimes I-I\bigotimes X^T,(ad(X))^2=X^2\bigotimes I+I\bigotimes {X^2}^T-2X\bigotimes X^T,\cdots$. (if the vectorization of a matrix is formed by stacking its ROWS into a single column vector ; cf. http://en.wikipedia.org/wiki/Kronecker_product).

Related Question