[Math] Reconstruct 3D Transformation matrix from 2D projection matricies

matricesmatrix decomposition

I'm working on reconstucting movement of 3D objects based on 2D orthogonal projections of that object.

Till now I'm working with two orthogonal projections of the object, one on the z-x plane and one on the z-y plane.

(I want to neglect the projection to x-y plane. That information is not important to me. Should that be a problem then I can easily compute the 2×3 matrix for x-y, too and add that information).

By using two consecutive frames of those projections I can compute a 2×3 transformation matrices using OpenCV's estimateRigidTransform() function. The 2x3 transformation matrices include Rotation, Scaling and Translation:

$$T_{z,x} = \left[ \begin{array}{ccc} \cos(\theta)s & -\sin(\theta)s & tz \\ \sin(\theta)s & \cos(\theta)s & tx\end{array} \right]$$

$$T_{z,y} = \left[ \begin{array}{ccc} \cos(\theta)s & -\sin(\theta)s & tz \\ \sin(\theta)s & \cos(\theta)s & ty\end{array} \right]$$

(Here you can find more detailed information about what exactly estimateRigidTransform() does.)

Here is a sample visualization for clarification:


So after computing this for based on the two 2D projections I want to reconstruct a 3D transformation matrix out of this information the manipulate the 3D object. My aim is to undo the transformation that happend to the 3D object between Frame 1 and Frame 2. Therefore I want to apply the 3D transfrmation matrix to the object so that it is in the same spatial position it was at when Frame 1 was taken.

Sample 3D visualization:

$\hskip2in$

I understand that a 3D transformation matrix is build up by the following logic:

$$ S = \begin{bmatrix}
S_{x}& 0& 0& 0\\
0& S_{y}& 0& 0\\
0& 0& S_{z}& 0\\
0& 0& 0& 1
\end{bmatrix} $$

$$ T = \begin{bmatrix}
1& 0& 0& 0\\
0& 1& 0& 0\\
0& 0& 1& 0\\
t_{x}& t_{y}& t_{z}& 1\\
\end{bmatrix} $$

$$R_{x}(\theta) = \begin{bmatrix}
1& 0& 0& 0\\
0& \cos\theta & −\sin\theta& 0\\
0& \sin\theta & \cos\theta& 0\\
0& 0& 0& 1\\
\end{bmatrix}$$

$$R_{y}(\theta) = \begin{bmatrix}
\cos\theta& 0& \sin\theta& 0\\
0& 1& 0& 0\\
−\sin\theta& 0& \cos\theta& 0\\
0& 0& 0& 1\\
\end{bmatrix}$$

$$R_{z}(\theta) =\begin{bmatrix}
\cos\theta & −\sin\theta & 0& 0\\
\sin\theta & \cos\theta & 0& 0\\
0& 0& 1& 0\\
0& 0& 0& 1
\end{bmatrix}$$

(I do not want to include shearing.)

My question is how to use the information held by the two 2D 2×3 transformation matrices to construct a 3D transformation matrix that in total transforms the 3D object in the same way as the 2D matrices would if I apply them to their corresponding planes?

EDIT:

As suggested by amd I tried to find more constrains to reduce the number of free parameters.

I analysed the movement of the object and I found out that the 3 dominant parameters are:

  • Translation in z-direction
  • Rotation around the x-axis (Pitch)
  • Rotation around the y-axis (Yaw)

What can be neglected is:

  • Translation in negative z-direction
  • Rotation around the z-axis (Roll)

I hope that helps.

Best Answer

[Not an answer, but too long for comments.]

I wonder if you might be making the problem harder by “simplifying” it. Here’s what a few simple rigid motions look like when projected onto the $y$-$z$ plane:

  • Rotation about the $z$-axis: $[x\sin\gamma+y\cos\gamma,z]$, so equivalent to the planar transformation $$\begin{bmatrix}y&z&1\end{bmatrix}\begin{bmatrix}\cos\gamma&0\\0&1\\x\sin\gamma&0\end{bmatrix}$$ which looks like scaling+translation.
  • Rotation about the $y$-axis: $[y,z\cos\beta-x\sin\beta,1]$, equivalent to $$\begin{bmatrix}y&z&1\end{bmatrix}\begin{bmatrix}1&0\\0&\cos\beta\\0&-x\sin\beta\end{bmatrix}$$ which also looks like scaling+translation
  • Rotation about $y$, then $z$: $[y\cos\gamma+z\sin\beta\sin\gamma+x\cos\beta\sin\gamma,z\cos\beta-x\sin\beta]$, or $$\begin{bmatrix}y&z&1\end{bmatrix}\begin{bmatrix} \cos\gamma & 0 \\ \sin\beta\sin\gamma & \cos\beta \\ x\cos\beta\sin\gamma & -x\sin\beta \end{bmatrix}$$ which is more like a shear+translation. Trying to approximate this by a shearless affine transformation might not work so well.

Of concern, too, is the “free” factor of $x$ in the translation part of all of these transformations. This doesn’t really correspond to anything in the image plane and makes the resulting transformation non-affine. It’s reminiscent of the way that $z$ appears in the derivation of the perspective projection, so I wonder if you might get better results by computing a planar perspective transform for each view and trying to reconstruct the 3-D transformation from those. On the other hand, if the frame-to-frame difference is small, affine approximations shouldn’t be too bad.