Solve for 2D translation, rotation and scale given two touch point movements

geometrylinear algebralinear-transformations

For a phone app I need to let users translate, rotate and scale a rectangle on the 2D plane using "pinch" gesture.

Suppose the rectangle already has TRS = $t, \theta, s $; and the user has their fingers on two touch points $p_1, p_2$. The user then moves the touch points to $q_1, q_2$. What is the new TRS that will map $p_1 \rightarrow q_1, p_2 \rightarrow q_2$. In other words, the user should feel that their fingers are "stuck" to the rectangle. The scale is always the same for $x$ and $y$.

The scale is easy. Let $p=p_1 – p_2$ and $q=q_1 – q_2$, then $s' = s \cdot \dfrac{||q||}{||p||} $

I think rotation is can use the arcSine of the cross product. Let $q_\bot = (-q_y, q_x)$ and
$\Delta\theta = arcSine \dfrac{q_\bot\cdot p}{||q||\cdot||p||}$, then $\theta' = \theta + \Delta\theta$

I think the translation should preserve the midpoint. So: $t' = t + \frac{1}{2}((q_1 – p_1)+(q_2 – p_2))$.

But when I implement this, it doesn't work very well. The rotation is not around the correct point so the rectangle seems to slide away strangely. Any advice please?

Best Answer

First, some background. Let's review the underlying math and notation.

Transformation $\mathbf{T}$ has four free parameters: scale $\lambda$, rotation $\theta$, and translation by $(t_x, t_y)$. Let's define the transformation of point $\vec{p} = (p_x , p_y)$ to point $\vec{q} = (q_x , q_y)$ as $$\vec{q} = \mathbf{T} \vec{p} \quad \iff \quad \left[ \begin{matrix} q_x \\ q_y \\ 1 \end{matrix} \right] = \left[ \begin{matrix} \lambda \cos \theta & -\lambda \sin \theta & t_x \\ \lambda \sin \theta & \lambda \cos \theta & t_y \\ 0 & 0 & 1 \end{matrix} \right ] \left [ \begin{matrix} p_x \\ p_y \\ 1 \end{matrix} \right ] \tag{1a}\label{G1a}$$ Defining the transform as a single matrix $\mathbf{T}$ as $$\mathbf{T} = \left[ \begin{matrix} \lambda \cos \theta & -\lambda \sin \theta & t_x \\ \lambda \sin \theta & \lambda \cos \theta & t_y \\ 0 & 0 & 1 \end{matrix} \right ] \tag{1b}\label{G1b}$$ is useful, because we can calculate the inverse ($\vec{p} = \mathbf{T}^{-1} \vec{q}$), $$\mathbf{T}^{-1} = \left[ \begin{matrix} \frac{\cos\theta}{\lambda} & \frac{\sin\theta}{\lambda} & - \frac{t_x \cos\theta + t_y \sin\theta}{\lambda} \\ - \frac{\sin\theta}{\lambda} & \frac{\cos\theta}{\lambda} & \frac{t_x \sin\theta - t_y \cos\theta}{\lambda} \\ 0 & 0 & 1 \end{matrix} \right] \tag{1c}\label{G1c}$$ and combine transformations via matrix multiplication: $$\mathbf{T}_\text{combined} = \mathbf{T}_\text{after} \mathbf{T}_\text{before} \tag{1d}\label{G1d}$$ where the rightmost transformation is applied first, and leftmost transformation last. Note that the identity transform, "no transform" or "as-is", is $$\mathbf{1} = \left[ \begin{matrix} 1 & 0 & 0 \\ 0 & 1 & 0 \\ 0 & 0 & 1 \end{matrix} \right] \tag{1e}\label{G1e}$$ This way we can also "ignore" the current transform ($\mathbf{T}_\text{before}$) for the purposes here, and just calculate the transform $\mathbf{T}_\text{after}$ caused by the gesture here; and get the combined transform using $\eqref{G1d}$.

(Note that such $3 \times 3$ matrices (with only $2 \times 3$ elements actually stored) are commonly used in 2D computer graphics, including SVG transform matrix() attribute, with the third component ($1$) of vectors not stored, and just baked in to the functions doing matrix-vector multiplication. Similarly for $4 \times 4$ matrices (with only $3 \times 4$ elements actually stored) for 3D computer graphics, including OpenGL.)

The problem is to find the transformation $\mathbf{T}$ that transforms $\vec{p}_1$ and $\vec{p}_2$ to $\vec{q}_1$ and $\vec{q}_2$, $$\left\lbrace \, \begin{aligned} \vec{q}_1 &= \mathbf{T} \vec{p}_1 \\ \vec{q}_2 &= \mathbf{T} \vec{p}_2 \\ \end{aligned} \right. \tag{2}\label{G2}$$ which is actually a set of four equations (two Cartesian components per vector), and four unknowns (that define $\mathbf{T}$), and therefore should have a single solution (except for degenerate cases, where $\vec{p}_1 = \vec{p}_2$ or something similar).

Note that because of finite precision of floating-point numbers, we do not want to apply the transformation incrementally. That is, we should remember the very initial touch points $\vec{p}_1$ and $\vec{p}_2$, keeping them unchanged for the duration of the gesture, also remembering the original transformation $\mathbf{T}_\text{before}$, compute a temporary gesture transformation $\mathbf{T}_\text{gesture}$ based on the initial touch points $\vec{p}_1$ and $\vec{p}_2$ and current touch points $\vec{q}_1$ and $\vec{q}_2$, and a temporary currently applied transformation $\mathbf{T}_\text{temp} = \mathbf{T}_\text{gesture}$. Only when the gesture is finished, will the final $\mathbf{T}_\text{temp}$ become the currently applied and stored transformation.

If you apply the transformation incrementally, i.e. $\vec{p}_1$, $\vec{p}_2$ corresponding to the previous touch locations, and $\vec{q}_1$, $\vec{q}_2$ corresponding to current touch locations, and continously apply $\mathbf{T}_\text{current} = \mathbf{T}_\text{gesture} \mathbf{T}_\text{previous}$, not only do your rounding errors compound (making the gesture control erratic), but it will also be jittery, because small movement of one finger when close to the other will cause large rotations; the faster your touch pad update rate, the more likely such small movements are. The incremental moment-to-moment transformation approach just does not work in practice.


Let $\vec{p}_1$ and $\vec{p}_2$ be the initial touch points, and $\vec{q}_1$ and $\vec{q}_2$ the corresponding final positions: $$\vec{p}_1 = \left[ \begin{matrix} x_1 \\ y_1 \\ 1 \end{matrix} \right], \; \vec{p}_2 = \left[ \begin{matrix} x_2 \\ y_2 \\ 1 \end{matrix} \right], \; \vec{q}_1 = \left[ \begin{matrix} \chi_1 \\ \gamma_1 \\ 1 \end{matrix} \right], \; \vec{q}_2 = \left[ \begin{matrix} \chi_2 \\ \gamma_2 \\ 1 \end{matrix} \right]$$ Let $C = \lambda \cos \theta$, $S = \lambda \sin \theta$, and translation is by $(X, Y)$, so that the transformation matrix is $$\mathbf{T} = \left[ \begin{matrix} C & -S & X \\ S & C & Y \\ 0 & 0 & 1 \\ \end{matrix} \right] \tag{3a}\label{G3a}$$ Note that as long $C S \ne 0$, the upper left $2 \times 2$ matrix is a proper rotation and scaling matrix, for all $C \in \mathbb{R}$, $S \in \mathbb{R}$. The vector $(C, S)$ is the new $x$ axis basis vector, and $(-S, C)$ the new $y$ axis basis vector, and the two are always perpendicular (with positive $y$ axis 90° counterclockwise from positive $x$ acis) and have the same length.

Substituting these into $\eqref{G2}$ we have four equations and four unknowns ($C$, $S$, $X$, and $Y$), which has exactly one solution: $$\left\lbrace \, \begin{aligned} L^2 &= (x_2 - x_1)^2 + (y_2 - y_1)^2 \\ C &= \frac{ (\chi_2 - \chi_1)(x_2 - x_1) + (\gamma_2 - \gamma_1)(y_2 - y_1) }{L^2} \\ S &= \frac{ (\gamma_2 - \gamma_1)(x_2 - x_1) - (\chi_2 - \chi_1)(y_2 - y_1) }{L^2} \\ X &= \frac{ (x_2 - x_1)(\chi_1 x_2 - \chi_2 x_1) + (y_2 - y_1)(\chi_1 y_2 - \chi_2 y_1) + (\gamma_2 - \gamma_1)(x_2 y_1 - x_1 y_2) }{L^2} \\ Y &= \frac{ (\chi_2 - \chi_1)(x_1 y_2 - x_2 y_1) + (y_2 - y_1)(\gamma_1 y_2 - \gamma_2 y_1) + (x_2 - x_1)(\gamma_1 x_2 - \gamma_2 x_1) }{L^2} \\ \end{aligned} \right. \tag{3b}\label{G3b}$$

If we do need the angle $\theta$ and the scale factor $\lambda$, they are obviously $$\left\lbrace ~ \begin{aligned} \lambda &= \sqrt{C^2 + S^2} \\ \theta &= \operatorname{atan2}\left(S, C\right) \\ \cos\theta &= \displaystyle \frac{C}{\sqrt{C^2 + S^2}} \\ \sin\theta &= \displaystyle \frac{S}{\sqrt{C^2 + S^2}} \\ \end{aligned} \right. \tag{3c}\label{G3c}$$ where $\operatorname{atan2}$ is the two-parameter form of arcus tangent, provided by almost all programming languages. ($\operatorname{atan2}(y, x) = \arctan(y/x)$ for $x \gt 0$, but is valid for all $x, y \in \mathbb{R}$ except $x=0, y=0$, as it takes the signs of both $x$ and $y$ into account in other quadrants. Typically, it returns the angle in radians between $-\pi$ (-180°) and $+\pi$ (+180°), but it may vary between programming languages.)


We can verify the above using $$\vec{o} = \left[\begin{matrix} o_x \\ o_y \\ 1 \end{matrix}\right], ~ \vec{t} = \left[\begin{matrix} t_x \\ t_y \\ 1 \end{matrix}\right], ~ \vec{b} = \left[\begin{matrix} b_x \\ b_y \\ 1 \end{matrix}\right], ~ \vec{a} = \left[\begin{matrix} a_x \\ a_y \\ 1 \end{matrix}\right]$$ and $$\vec{p}_1 = \vec{o} + \vec{b}, ~ ~ \vec{p}_2 = \vec{o} - \vec{b}, ~ ~ \vec{q}_1 = \vec{o} + \vec{t} + \vec{a}, ~ ~ \vec{q}_2 = \vec{o} + \vec{t} - \vec{a}$$ so that $\vec{o}$ represents the middle point between the two initial touch points, $\vec{b}$ represents the vector from the middle point to the first initial touch point and $-\vec{b}$ the vector from the middle point to the second initial touch point, $\vec{t}$ represents the translation, and $\vec{a}$ represents the vector from the middle point between the final touch points ($\vec{o}+\vec{t}$) to the first final touch point, and $-\vec{a}$ the vector from the middle point between the final touch points to the second final touch point. The transformation matrix is then $$\mathbf{T} = \left[ \begin{matrix} \displaystyle \frac{a_x b_x + a_y b_y}{b_x^2 + b_y^2} & \displaystyle \frac{a_x b_y - a_y b_x}{b_x^2 + b_y^2} & \displaystyle t_x + o_x - \frac{o_x (a_x b_x + a_y b_y ) + o_y (a_x b_y - a_y b_x )}{b_x^2 + b_y^2} \\ \displaystyle \frac{a_y b_x - a_x b_y}{b_x^2 + b_y^2} & \displaystyle \frac{a_x b_x + a_y b_y}{b_x^2 + b_y^2} & \displaystyle t_y + o_y - \frac{ o_x ( a_y b_x - a_x b_y ) + o_y (a_x b_x + a_y b_y ) }{b_x^2 + b_y^2} \\ 0 & 0 & 1 \end{matrix} \right]$$ and if we do the math, we indeed find that $\mathbf{T} \vec{p}_1 = \vec{q}_1$ and $\mathbf{T} \vec{p}_2 = \vec{q}_2$.


Note that we can extract rotation angle, scale factor, and translation from such a $3 \times 3$ matrix (assuming it is orthogonal, not skewed) with entries $$\mathbf{M} = \left[ \begin{matrix} m_{11} & m_{12} & m_{13} \\ m_{21} & m_{22} & m_{23} \\ 0 & 0 & 1 \end{matrix} \right] \tag{4}\label{G4a}$$ via $$\left\lbrace ~ \begin{aligned} \theta &= \operatorname{atan2}\left( m_{21} - m_{22} ,\, m_{11} + m_{22} \right) \\ \lambda &= \frac{1}{2}\left(\sqrt{m_{11}^2 + m_{21}^2} + \sqrt{m_{12}^2 + m_{22}^2}\right) \\ t_x &= X = m_{13} \\ t_y &= Y = m_{23} \\ \end{aligned} \right. \tag{4b}\label{G4b}$$ In particular, if you have an existing transfrom (by $\theta_1$, $\lambda_1$, $(X_1, Y_1)$), and you wish to further transform it (by $\theta_2$, $\lambda_2$, $(X_2, Y_2)$), first construct the two matrices using $$\mathbf{M} = \left[ \begin{matrix} \lambda \cos\theta & -\lambda \sin\theta & X \\ \lambda \sin\theta & \lambda \cos\theta & Y \\ 0 & 0 & 1 \end{matrix} \right ] \tag{4c}\label{G4c}$$ then calculate their product via matrix multiplication, original orientation rightmost, latest applied transform leftmost, $$\mathbf{M}_\text{final} = \mathbf{M}_2 \mathbf{M}_1$$ and then extract the rotation angle $\theta$, scale factor $\lambda$, and translation $(X, Y)$ as specified in $\eqref{G4b}$.

Related Question