[Math] Derivative of function including matrix logarithm

lie-algebraslie-groupsmatricesoptimization

Is the following equation a first order approximation or incorrect for general matrix Lie groups? And what are the higher order terms?

$$\frac{\partial}{\partial\mathbf x} (\log(\mathtt A\cdot\exp(\widehat{\mathbf x})\cdot \mathtt B))^\vee \approx \mathtt{Ad_A}$$

with $\mathtt{A,B}$ are element of a Matrix Lie group $G(n)$ (=$n\times n$-matrices),
$\mathbf x$ being a n-vector, $\exp(\cdot)$ being the matrix exponential, $\log(\cdot)$ being the matrix logarithm, $\widehat{\cdot}$ being an operator which maps an $n$-vector to a Lie algebra element (=$n\times n$-matrix), v being the corresponding inverse, and $\mathtt{Ad_A}$ is the adjoint of $\mathtt A$ (in matrix form).

Before I start with my approach let me first talk about the definitions and some underlying lemmas in more detail. You might want to skip this and jump directly to my attempt below the line (=======).

The operator $\widehat{\cdot}$ maps a vector onto the Lie algebra element $\widehat{\cdot}:\mathbb R^n\rightarrow g, \widehat{\mathbf x} = \sum_i x_i \mathtt G_i$, with $\mathtt G_i$ being the generators of the Lie algebra $g$. The vee operator $(\cdot)^\vee: g\rightarrow \mathbb R^n$ is the corresponding inverse.

Example for SO3:

$\widehat{\mathbf x} = \begin{bmatrix} 0&-x_3& x_2\\ x_3&0,&-x_1\\-x_2&x_1&0\end{bmatrix}$

$(\mathtt R)^\vee= \begin{bmatrix}R_{3,2}\\R_{1,3}\\R_{2,1}\end{bmatrix}=\frac{1}{2} \begin{bmatrix}R_{3,2}-R_{2,3}\\R_{1,3}-R_{3,1}\\R_{2,1}-R_{1,2}\end{bmatrix} = -\begin{bmatrix}R_{2,3}\\R_{3,1}\\R_{1,2}\end{bmatrix} $

Now let us look at the definition of the adjoint:

$Adj_\mathtt A(\widehat{\mathbf x}) := \mathtt A \widehat{\omega} \mathtt A^{-1}$ with $\mathbf x$ being an $n$-vector and $\mathtt A$ a matrix Lie group element.

$Adj_\mathtt A$ can be seen as linear operator. Thus, there exists a $n\times n$ matrix $\mathtt{Ad_A}$ such at that $\mathtt{Ad_A}\cdot \mathbf x = (Adj_\mathtt A(\widehat{\mathbf x}))^\vee.$

Example for SO3:
$\mathtt{Ad_R}\mathbf x = \mathtt R \widehat{\mathbf x} \mathtt R^\top$ $\Rightarrow$ $\mathtt{Ad_R=R}$

Since $\mathtt A\exp(\mathtt B)\mathtt A^{-1}=\exp(\mathtt{ABA}^{-1})$ it is true that

$\exp(\widehat{\mathtt{Ad_A} \mathbf x}) = \mathtt{A} \exp(\widehat{\mathbf x}) \mathtt{A}^{-1}.$
Thus:
$$\exp(\widehat{\mathtt{Ad_A} \mathbf x}) \mathtt A= \mathtt{A} \exp(\widehat{\mathbf x}) \quad (1)$$

Let us look at the formula $\text{BCH}(\mathtt{A,B) := \log(\exp(A),\exp(B))}$. (2)

We know, if $\mathtt {A,B}$ commute, $\text{BCH}(\mathtt{A,B}) = \mathtt{A+B}$.

Otherwise, it can be approximated with the Baker-Campell-Hausdorff (BCH) formula. The first two terms are:

$t_1 = \mathtt{A+B}$

$t_2 = \frac{1}{2}(\mathtt{AB-BA})$

===============================================

My approach:

Let $\mathbf c := \log(\mathtt{AB})$

$\frac{\partial}{\partial\mathbf x} (\log(\mathtt A\cdot\exp(\widehat{\mathbf x})\cdot \mathtt B))^\vee = \frac{\partial}{\partial\mathbf x} (\log(\exp(\widehat{\mathtt {Ad_A}\mathbf x})\mathtt A\mathtt B))^\vee$ (using (1))

$= \frac{\partial}{\partial\mathbf x} (\text{BCH}(\widehat{\mathtt {Ad_A}\mathbf x},\mathbf c))^\vee$ (using (2))


Now if $\mathtt {A,B}$ are elements of commutative group (thus, $\widehat{\mathtt{Ad_A}\mathbf x}$ and $\mathbf c$ are elements of a commutative algebra) , we simply get:

$= \frac{\partial}{\partial\mathbf x} \mathtt{Ad_A}\mathbf x + \frac{\partial}{\partial\mathbf x} (\mathbf c)^\vee$
$= \mathtt{Ad_A} + \mathtt O= \mathtt{Ad_A}$

(Edit: Actually, in this case $\mathtt{Ad_A}=\mathtt I$ always.)


However, if $\mathtt {A,B}$ are elements of a general (non-commutative) matrix group, we have to use the BCH formula…

Best Answer

Yes, the first order approximation using the adjoint is correct.

It is easyer to see if one interpret members of the Lie algebra as minimal vectors $\mathbf{x}$ instead of square matrices $\widehat{\mathbf{x}}$. Thus, we define the Lie bracket as $[\mathbf{a},\mathbf{b}]:=\widehat{\mathbf{a}}\cdot\widehat{\mathbf{b}}-\widehat{\mathbf{b}}\cdot\widehat{\mathbf{a}}$. In case of SO3, it is simply the cross product: $[\mathbf{a},\mathbf{b}]=\mathbf{a}\times\mathbf{b}$. However, a Lie bracket in such a vector from exists for all other matrix Lie groups too. Accordingly, the BCH-Formular is now defined as $\text{bch}(\mathbf{a},\mathbf{b}):=\log(\exp(\widehat{\mathbf{a}})\exp(\widehat{\mathbf{b}}))^\vee$

For instance, as a third order approximation, we get:

$$\left. \frac{\partial}{\partial \mathbf{x}} \log(\mathtt{A}\exp(\mathbf{x})\mathtt{B})^\vee\right|_{\mathbf{x}=\mathbf{0}} $$ $$ =\left.\frac{\partial}{\partial \mathbf{x}} \log\left(\exp(\widehat{\mathtt{Ad}_\mathtt{A}\mathbf{x}})\mathtt{A}\mathtt{B}\right)^\vee\right|_{\mathbf{x}=\mathbf{0}}$$ $$=\left.\frac{\partial}{\partial \mathbf{x}} \log\left(\exp(\widehat{\mathtt{Ad}_\mathtt{A}\mathbf{x}})\exp(\widehat{\mathbf{c}})\right)^\vee\right|_{\mathbf{x}=\mathbf{0}}$$ $$= \left.\frac{\partial}{\partial \mathbf{x}} \text{bch}(\mathtt{Ad}_\mathtt{A}\mathbf{x},\mathbf{c})\right|_{\mathbf{x}=\mathbf{0}}$$ $$\approx \frac{\partial}{\partial \mathbf{x}}\left( \mathtt{Ad}_\mathtt{A}\mathbf{x}+\mathbf{c} + \frac{1}{2}[\mathtt{Ad}_\mathtt{A}\mathbf{x},\mathbf{c}]+\frac{1}{12}([\mathtt{Ad}_\mathtt{A}\mathbf{x},[\mathtt{Ad}_\mathtt{A}\mathbf{x},\mathbf{c}]]+ [\mathbf{c},[\mathbf{c},\mathtt{Ad}_\mathtt{A}\mathbf{x}]])\right)_{\mathbf{x}=\mathbf{0}}$$ $$ =\left.\left(\frac{\partial \mathbf{y}}{\partial\mathbf{y}} + \frac{1}{2}\cdot\frac{\partial [\mathbf{y},\mathbf{c}]}{\partial \mathbf{y}}\right|_{\mathbf{y}=\mathtt{Ad}_\mathtt{A}\mathbf{0}} +\frac{1}{12}\left(\frac{\partial[\mathbf{y},[\mathbf{y},\mathbf{c}]}{\partial\mathbf{y}}-\frac{\partial[\mathbf{c},[\mathbf{y},\mathbf{c}]}{\partial \mathbf{y}}\right)_{\mathbf{y}=\mathtt{Ad}_\mathtt{A}\mathbf{0}}\right) \left.\frac{\partial \mathtt{Ad}_\mathtt{A}\mathbf{x}}{\partial \mathbf{x}}\right|_{\mathbf{x}=\mathbf{0}}$$ $$= \left(\mathtt{I} + \left.\frac{1}{2}\cdot\frac{\partial [\mathbf{y},\mathbf{c}]}{\partial \mathbf{y}}\right|_{\mathbf{y}=\mathbf{0}} + \frac{1}{12} \left(\frac{\partial [\mathbf{y},[\mathbf{0},\mathbf{c}]]}{\partial \mathbf{y}}+ \frac{\partial [\mathbf{0},[\mathbf{y},\mathbf{c}]]}{\partial \mathbf{y}} -\left.\frac{\partial [\mathbf{c},\mathbf{w}]}{\partial \mathbf{w}}\right|_{\mathbf{w}=[\mathbf{0},\mathbf{c}]} \left.\frac{\partial [\mathbf{y},\mathbf{c}]}{\partial \mathbf{y}} \right|_{\mathbf{y}=\mathbf{0}} \right)\right) \mathtt{Ad}_\mathtt{A}\nonumber$$ $$= \left(\mathtt{I} + \left.\frac{1}{2}\cdot\frac{\partial [\mathbf{y},\mathbf{c}]}{\partial \mathbf{y}}\right|_{\mathbf{y}=\mathbf{0}} + \frac{1}{12}\left.\frac{\partial [\mathbf{w},\mathbf{c}]}{\partial \mathbf{w}}\right|_{\mathbf{w}=\mathbf{0}} \left.\frac{\partial [\mathbf{y},\mathbf{c}]}{\partial \mathbf{c}}\right|_{\mathbf{y}=\mathbf{0}} \right)\mathtt{Ad}_\mathtt{A}$$ $$= \left(\mathtt{I} + \left.\frac{1}{2}\cdot\frac{\partial [\mathbf{y},\mathbf{c}]}{\partial \mathbf{y}}\right|_{\mathbf{y}=\mathbf{0}} + \frac{1}{12}\left( \left.\frac{\partial [\mathbf{y},\mathbf{c}]}{\partial \mathbf{c}}\right|_{\mathbf{y}=\mathbf{0}} \right)^2 \right)\mathtt{Ad}_\mathtt{A}$$