Lie Groups – Minimization on the Lie Group SO(3)

lie-groupsrigid transformationrotations

Refering to a question previously asked Jacobian matrix of the Rodrigues' formula (exponential map). It was suggested in one of the answer to only calculate simpler Jacobian $\left.\frac{\partial}{\partial \mathbf p}\exp(\hat{\mathbf p})\right|_{\mathbf p=\mathbf 0}$ with $\exp(\hat{\mathbf p})$ being identity.

Similar idea is suggested in a technical report on Minimization on the Lie Group SO(3) and Related Manifolds .

Using Gauss-Newton based strategy, it was suggested to solve for $\delta$:

$$ \mathtt J^\top\mathtt J \delta = -\mathtt J^\top f(\mathtt R^{(m)})$$

Final update rule:
$$ \mathtt R^{(m+1)}= \exp(\hat\delta)\cdot \mathtt R^{(m)}.$$

Since, Gauss-Newton stems from the Taylor series approximation. To optimize an objective function $\mathcal O(\mathtt R^{(m)})$:
$$\mathcal O(\mathtt R^{(m+1)}) \approx \mathcal O(\mathtt R^{(m)}) + \mathbb g^\top\delta + \delta^\top \mathtt H \delta$$

The gradient $\mathbb g$ and Hessian $\mathtt H$ are calculated at the point corresponding to $\mathtt R^{(m)}$ which is initially $\mathbf p = \mathbf 0$.

My question: As soon as we update $\mathtt R^{(m)}$, it no longer corresponds to $\mathbf p = \mathbf 0$. So, we should calculate gradient and Hessian at different $\mathbf p$ i.e. $\mathbf p^{(m+1)} = \delta + \mathbf p^{(m)}$. Can someone explain the idea behind only using the gradient and Hessian at $\mathbf p = \mathbf 0$? (This was suggested in the technical report above and the question referred.)

Best Answer

I am trying to solve the confusion, not using a rigorous mathematical argument, but rather by illustrating the similarity of standard Gauss Newton over the Euclidean Space and Gauss Newton wrt. to a matrix Lie Group.

Let us first look at the derivative of a scalar function $f$: $$\frac{\partial f(x)}{\partial x} := \lim_{y \to x} \frac{f(y)-f(x)}{y-x}$$ Now with $$\lim_{y \to x} \frac{f(y)-f(x)}{y-x} \overset{y=p+x}{=} \lim_{p \to 0} \frac{f(p+x)-f(x)}{p} =: \left.\frac{\partial f(p+x)}{\partial p}\right|_{p=0}$$

Thus, we get: $$\frac{\partial f(x)}{\partial x} = \left.\frac{\partial f(p+x)}{\partial p}\right|_{p=0}~.$$

In other words, any derivative $\frac{\partial f(x)}{\partial x}$ can be written as a derivative around zero $\left.\frac{\partial f(p+x)}{\partial p}\right|_{p=0}$. (Source: Derivative as derivative around zero?)

Using a similar argument we can show that it holds for partial derivative of a function defined over the Euclidean vector space: $$\frac{\partial f(\mathbf x)}{\partial x_i} = \left.\frac{\partial f(\mathbf p+ \mathbf x)}{\partial p_i}\right|_{\mathbf p=0}~.$$

Hence, the standard Gauss Newton Scheme can be rewritten as:

$$ \mathbf J^\top\mathbf J \delta = -\mathbf J^\top f(\mathbf x) \quad\text{with}\quad \mathbf J :=\left.\frac{\partial f(\mathbf p+\mathbf x_m)}{\partial \mathbf p} \right|_{\mathbf p =\mathbf 0}$$

With the update rule: $$ \mathbf x_{m+1}= \mathbf \delta + \mathbf x_m.$$

Thus, we can conclude that even the standard Gauss Newton approach can be seen as an optimisation around the identity ($\mathbf p =\mathbf 0$). Obviously, the Jacobian needs to be recalculated after each update!


Maybe this answers the question already but let me continue:

A matrix Lie group can be seen as a generalization over the Euclidean vector space. Thus, the Euclidean vector space is a trivial example of a matrix Lie group. We represent a vector $\mathbf a \in \mathbb{R}^n$ as a $(n+1)\times (n+1)$ matrix $$\mathbf{A} := \begin{bmatrix}\mathbf{I}_{n\times n} &\mathbf a\\\mathbf{O}_{1\times n}&1 \end{bmatrix} $$ the vector addition is written as matrix multiplication $$\begin{bmatrix}\mathbf{I}_{n\times n} &\mathbf a\\\mathbf{O}_{1\times n}&1 \end{bmatrix}\cdot \begin{bmatrix}\mathbf{I}_{n\times n}&\mathbf b\\\mathbf{O}_{1\times n}&1 \end{bmatrix} = \begin{bmatrix}\mathbf{I}_{n\times n} &\mathbf a+\mathbf b\\\mathbf{O}_{1\times n}&1 \end{bmatrix}$$ and the matrix exponential is simply the identity: $$ \exp(\hat{\mathbf a}) = \begin{bmatrix}\mathbf{I}_{n\times n} &\mathbf a\\\mathbf{O}_{1\times n}&1 \end{bmatrix}~.$$ Using these new conventions, we can rewrite the Gauss Newton scheme ($\mathbf A$ instead of $\mathbf a$, matrix multiplication instead of vector addition, $\exp(\hat{\mathbf p})$ instead of $\mathbf p$).

Finally we get the following Jacobian and update rule for the generalized Gauss Newton Scheme:

$$\mathbf J :=\left.\frac{\partial f(\exp(\hat{\mathbf p})\cdot\mathbf A^{(m)})}{\partial \mathbf p} \right|_{\mathbf p =\mathbf 0}$$

and

$$ \mathbf A^{(m+1)}= \exp(\hat\delta)\cdot \mathbf A^{(m)}.$$