[Math] How to linearize Quaternions

complex-geometrycoordinate systemsgeometryquaternionsrotations

Based on an answer to one of my questions and the comments exchanged here earlier I noticed that I cannot uniformly sample Quaternion vectors for rotation even though if I know the bounds of each element of the vector. More specifically, I mean this comment:

If you're going to do your max/min approach to find bounds, I'd
recommend that you: 1) throw out quaternions with small L2 norms (say,
less than 12) as well as large ones, as otherwise your bounds won't be
useful, and 2) generate a really huge number of quaternions to find
your bounds (leave your computer on overnight), as otherwise you might
miss some low-probability part of the region

To me, this means that Quaternions vectors of rotation are not linear and there they have some sort of weird curvature around the corners (or maybe in the middle too?).

So I wonder, is it possible to develop a method to linearize Quaternions?

The reasons that I want to linearize Quaternion vectors of rotation is as follow:

  • I want to $\pm$ some value to each element of the Quaternion vector to avoid some degeneration issues that the optimization would face. If I simply add some values to a non-linearized Quaternion vector then I might drastically move in the space, which is not what I want (this is the main reason)
  • I'm trying to sample Quaternion rotation vectors uniformly and without doing rejection sampling, which is computationally intensive on large-scale.
  • I am using an optimization procedure to find some Quaternion rotation vector that fits my data. Sampling from Euler angles and then converting it to Quaternion makes the optimization procedure way harder because Euler angles are non-linear and hard to interpret.

Ideally, I would like to get a deterministic, one-to-one mapping function for each element of the Quaternion vector $(x, y, z, w)$ but I am also open to use some computational approaches to empirically learn/compute this linear mapping.

Best Answer

I think it's better to provide this as a separate answer, because the contents are rather largely independent to the previous answer.

As I understood, what you want to do is a kind of gradient descent inside the Lie group $\mathrm{SO}(3)$ (the set of 3-dimensional rotation matrices; recall that a $3\times3$-matrix $R$ is a rotation matrix if and only if $RR^{T}=I$ and $\mathrm{det}(R)=1$) or $\mathrm{SU}(2)$ (the set of unit quaternions). I've done before some sort of a Gauss-Newton optimization inside a Lie group, so I guess I can help you.

I am not sure how much you know about differential geometry, but I guess you are not very familiar with concepts such as Lie group or tangent space. Thus, I tried to avoid sophisticated mathematical constructions as much as possible.

First of all, I want to emphasize that as long as what you are interested in is the rotation of vectors in $\mathbb{R}^{3}$, then the exact way of representing rotation should not matter. That means, there should be no difference (aside from numerical issues like floating-point errors and computational performance) no matter how you represent rotations, because there are ways to convert between those representations, which implies that an algorithm for some representation should have a translation in terms of another representation.

Given this, it is helpful to conceptually think what you are finding is a rotation matrix $R\in\mathrm{SO}(3)$ (because that is "the most natural" representation of rotation), although you might use quaternions or Euler angles for actual computing.

Now, let $f:\mathrm{SO}(3)\rightarrow[0,\infty)$ be the cost function we wish to minimize using a gradient descent-like method. Let $R_{0}\in\mathrm{SO}(3)$ be the current (or initial) estimation of the optimal point. Of course, you can write $f$ as a function of $9$ entries of the rotation matrix $R$ and compute the gradient in terms of those components, but this is certainly a mistake, because then the updated matrix might not be a rotation matrix anymore. This is a problem of exactly the same kind of yours, right? So, what we need here is a way to represent $\mathrm{SO}(3)$ using independent parameters. (I guess this is what you wanted to do with a "linearization.") Mathematicians call such a parametrization a coordinate map, a coordinate chart, or a chart, depending on the literature.

There is a canonical way of doing this, indeed. For a given vector $\omega\in\mathbb{R}^{3}$, think of the anti-symmetric matrix $$[\omega]_{\times}:=\begin{bmatrix}0&-\omega_{z}&\omega_{y}\\\ \omega_{z}&0&-\omega_{x}\\\ -\omega_{y}&\omega_{x}&0\end{bmatrix}.$$ One can show that, the matrix exponential $$\exp([\omega]_{\times})=I+[\omega]_{\times}+\frac{[\omega]_{\times}^{2}}{2}+\ \cdots\ \stackrel{[1]}{=}I+\frac{\sin\|\omega\|}{\|\omega\|}[\omega]_{\times}+\frac{1-\cos\|\omega\|}{\|\omega\|^{2}}[\omega]_{\times}^{2}$$ (to see why [1] holds, see https://en.wikipedia.org/wiki/Axis%E2%80%93angle_representation, the section explaining the Exponential map)

is the rotation matrix along the axis $\frac{\omega}{\|\omega\|}$ with the angle $\|\omega\|$. (The $\omega v$ in the previous answer exactly is the $\omega$ here.) And conversely, any rotation can be written in this way. The three parameters $(\omega_{x},\omega_{y},\omega_{z})$ are completely independent, so can be freely set. It does not have any constraint to satisfy. Thus, taking the gradient of $f$ with respect to these variables and then making an update according to it, is a completely legal algorithm.

However, often this is not a good method. I can't explain the reason very clearly, but I think one can say that that's because the coordinate map $(\omega_{x},\omega_{y},\omega_{z})\mapsto\exp([\omega]_{\times})$ is "good" only near the identity. I think you can verify yourself that, usually the things become very complicated if you attempt to write the cost function $f$ in terms of $(\omega_{x},\omega_{y},\omega_{z})$. (Do you see those horrible functions $\frac{\sin\|\omega\|}{\|\omega\|}$ and $\frac{1-\cos\|\omega\|}{\|\omega\|^{2}}$?) The resulting gradient descent method usually performs really poorly; even naive renormalization method in the quaternion space often performs better. (By renormalization method I mean, (1) just compute the derivative of $f$ with respect to four components of the quaternion, (2) update the quaternion accordingly, and then (3) normalize the resulting quaternion. Repeating (1)-(3), although heuristic at best, is often a valid method computing the optimal point of $f$. I think this renormalization method is close to what you were thinking about.)

So, what should we do then? I said the coordinate map we have talked about is "good" only near the identity. Indeed, the coordinate map is really good enough at least near the identity. Therefore, all you need is just to "translate" the coordinate map to the point $R_{0}$ by multiplying $R_{0}$; that is, instead of $(\omega_{x},\omega_{y},\omega_{z})\mapsto\exp([\omega]_{\times})$, you use $(\omega_{x},\omega_{y},\omega_{z})\mapsto\exp([\omega]_{\times})R_{0}$. (Depending on the situation, you might need to use $R_{0}\exp([\omega]_{\times})$ instead of $\exp([\omega]_{\times})R_{0}$.)

Let me give you an example that can help understanding what's going on. Let's say $f$ is given as: $$f(R):=\frac{1}{2}\|q-Rp\|^{2}.$$ This means, the cost is the squared-distance between two prescribed points $p$ and $q$, after rotating the point $p$ by the given rotation matrix $R$. Thus, the optimal rotation should align $q$ and $Rp$ in the same direction. Well, such a rotation can be computed directly, but let us just use this simple example to illustrate how gradient descent method can be adopted to $\mathrm{SO}(3)$.

Note that the problem of minimizing $f(R)$ is equivalent to that of $\tilde{f}(\omega):=f(\exp([\omega]_{\times})R_{0})$. The latter is a function on $\mathbb{R}^{3}$, thus we can make an attempt to do gradient descent update. Note that with respect to $\tilde{f}$, our current estimate $R_{0}$ corresponds to the origin $\omega=0$.

Now, let us compute the gradient of $\tilde{f}$ at the origin. Actually, it is easier (and more natural) to compute the directional derivative of $\tilde{f}$ rather than the gradient. Recall from vector calculus that $$\partial_{\omega}\tilde{f}=\langle\nabla\tilde{f},\omega\rangle,$$ where by $\partial_{\omega}\tilde{f}$ we denote the directional derivative of $\tilde{f}$ along the direction $\omega$. Thus, we can get $\nabla\tilde{f}$ by finding the directional derivative $\partial_{\omega}\tilde{f}$ of $\tilde{f}$ along an arbitrary direction $\omega$. In particular, at the origin, $$\partial_{\omega}\tilde{f}(0)=\lim_{\epsilon\rightarrow0} \frac{\tilde{f}(\epsilon\omega)-\tilde{f}(0)}{\epsilon} =\lim_{\epsilon\rightarrow0}\frac{f(\exp(\epsilon[\omega]_{\times})R_{0})-f(R_{0})}{\epsilon}.$$

Note that \begin{align*} f(R)&=\frac{1}{2}\langle q-Rp,q-Rp\rangle\\ &=\frac{1}{2}\left(\|q\|^{2}-\langle Rp,q\rangle-\langle q,Rp\rangle+\|Rp\|^{2}\right)\\ &=\frac{\|p\|^{2}+\|q\|^{2}}{2}-\langle q,Rp\rangle. \end{align*} (Note that $\|Rp\|=p$; the length of a vector is invariant under rotation.)

Therefore, for a small $\epsilon>0$, we get $$\frac{f(\exp(\epsilon[\omega]_{\times})R_{0})-f(R_{0})}{\epsilon} =-\left\langle q,\frac{\exp(\epsilon[\omega]_{\times})-I}{\epsilon}R_{0}p\right\rangle.$$ We can see from the Taylor expansion of $\exp(\epsilon[\omega]_{\times})$ that the right-hand side goes to $$-\langle q,[\omega]_{\times}R_{0}p\rangle$$ as $\epsilon\rightarrow0$. It can be easily seen that for any $v\in\mathbb{R}^{3}$, $[\omega]_{\times}v=\omega\times v$, the cross product of $\omega$ and $v$. Hence, $$\partial_{\omega}\tilde{f}(0)=-\langle q,\omega\times R_{0}p\rangle.$$ Using the identity $\langle a,b\times c\rangle=\langle b,c\times a\rangle$, we can rewrite this as $$-\langle \omega,R_{0}p\times q\rangle.$$

Comparing this with $$\partial_{\omega}\tilde{f}=\langle\nabla\tilde{f},\omega\rangle,$$ we get the conclusion: $$\nabla\tilde{f}(0)=-R_{0}p\times q.$$

Let's say the rate of movement is $\alpha>0$. Then, the next estimate of the optimal point of $\tilde{f}$ is $\alpha R_{0}p\times q$. In other words, the next estimate of the optimal rotation is: $$R_{1}=\exp(\alpha[R_{0}p\times q]_{\times})R_{0}.$$ This is the gradient descent update formula of $f$. Then, we can iterate this process to get $R_{2}=\exp(\alpha[R_{1}p\times q]_{\times})R_{1}$, $R_{3}=\exp(\alpha[R_{2}p\times q]_{\times})R_{2}$, and so on. Note that at each iteration we effectively use a "different cost function," because at the $k$th iteration what we manipulate is the function $\tilde{f}(\omega)=f(\exp([\omega]_{\times})R_{k-1})$.

If we use quaternions to represent rotations, the formula $$R_{1}=\exp(\alpha[R_{0}p\times q]_{\times})R_{0}$$ becomes even simpler to be evaluated, because the unit quaternion corresponding to the rotation matrix $\exp([\omega]_{\times})$ is just $$q=\left(\frac{\omega}{\|\omega\|}\sin\frac{\|\omega\|}{2}, \cos\frac{\|\omega\|}{2}\right)$$ in the $q=(x,y,z,w)$ convention. (Please refer to the previous answer.)

It is quite common that the cost function $f$ is of the form $f(R)=\sum_{i=1}^{N}f_{i}(Rp_{i})$, where $p_{i}\in\mathbb{R}^{3}$ and $f_{i}:\mathbb{R}^{3}\rightarrow\mathbb{R}$. After some computation as we did above, one can show by using chain rule that the directional derivative of $\tilde{f}(\omega):=f(\exp([\omega]_{\times})R_{0})$ along $\omega$ evaluated at the origin is $$\sum_{i=1}^{N}\langle\nabla f_{i}(R_{0}p_{i}),\omega\times R_{0}p_{i}\rangle =\sum_{i=1}^{N}\langle\omega,R_{0}p_{i}\times\nabla f_{i}(R_{0}p_{i})\rangle.$$ Hence, $$\nabla\tilde{f}(0)=\sum_{i=1}^{N}\big(R_{0}p_{i}\times\nabla f_{i}(R_{0}p_{i})\big),$$ thus the update formula is $$R_{1}=\exp\left(-\alpha\left[\sum_{i=1}^{N}\big(R_{0}p_{i}\times\nabla f_{i}(R_{0}p_{i})\big)\right]_{\times}\right)R_{0}.$$


Thank you for reading this loooong answer. I hope this answer helped you, but I think it was not clear enough to be easily grasped. Or, it could be possible that what I explained has nothing to do with your actual problem. So, please feel free to ask further.