I'm working on reconstucting movement of 3D objects based on 2D orthogonal projections of that object.
Till now I'm working with two orthogonal projections of the object, one on the z-x plane
and one on the z-y plane
.
(I want to neglect the projection to x-y plane
. That information is not important to me. Should that be a problem then I can easily compute the 2×3 matrix for x-y, too and add that information).
By using two consecutive frames of those projections I can compute a 2×3 transformation matrices using OpenCV's estimateRigidTransform() function. The 2x3 transformation
matrices include Rotation, Scaling and Translation:
$$T_{z,x} = \left[ \begin{array}{ccc} \cos(\theta)s & -\sin(\theta)s & tz \\ \sin(\theta)s & \cos(\theta)s & tx\end{array} \right]$$
$$T_{z,y} = \left[ \begin{array}{ccc} \cos(\theta)s & -\sin(\theta)s & tz \\ \sin(\theta)s & \cos(\theta)s & ty\end{array} \right]$$
(Here you can find more detailed information about what exactly estimateRigidTransform() does.)
Here is a sample visualization for clarification:
So after computing this for based on the two 2D projections I want to reconstruct a 3D transformation matrix out of this information the manipulate the 3D object. My aim is to undo the transformation that happend to the 3D object between Frame 1
and Frame 2
. Therefore I want to apply the 3D transfrmation matrix to the object so that it is in the same spatial position it was at when Frame 1
was taken.
Sample 3D visualization:
$\hskip2in$
I understand that a 3D transformation matrix is build up by the following logic:
$$ S = \begin{bmatrix}
S_{x}& 0& 0& 0\\
0& S_{y}& 0& 0\\
0& 0& S_{z}& 0\\
0& 0& 0& 1
\end{bmatrix} $$
$$ T = \begin{bmatrix}
1& 0& 0& 0\\
0& 1& 0& 0\\
0& 0& 1& 0\\
t_{x}& t_{y}& t_{z}& 1\\
\end{bmatrix} $$
$$R_{x}(\theta) = \begin{bmatrix}
1& 0& 0& 0\\
0& \cos\theta & −\sin\theta& 0\\
0& \sin\theta & \cos\theta& 0\\
0& 0& 0& 1\\
\end{bmatrix}$$
$$R_{y}(\theta) = \begin{bmatrix}
\cos\theta& 0& \sin\theta& 0\\
0& 1& 0& 0\\
−\sin\theta& 0& \cos\theta& 0\\
0& 0& 0& 1\\
\end{bmatrix}$$
$$R_{z}(\theta) =\begin{bmatrix}
\cos\theta & −\sin\theta & 0& 0\\
\sin\theta & \cos\theta & 0& 0\\
0& 0& 1& 0\\
0& 0& 0& 1
\end{bmatrix}$$
(I do not want to include shearing.)
My question is how to use the information held by the two 2D 2×3 transformation matrices to construct a 3D transformation matrix that in total transforms the 3D object in the same way as the 2D matrices would if I apply them to their corresponding planes?
EDIT:
As suggested by amd I tried to find more constrains to reduce the number of free parameters.
I analysed the movement of the object and I found out that the 3 dominant parameters are:
- Translation in z-direction
- Rotation around the x-axis (Pitch)
- Rotation around the y-axis (Yaw)
What can be neglected is:
- Translation in negative z-direction
- Rotation around the z-axis (Roll)
I hope that helps.
Best Answer
[Not an answer, but too long for comments.]
I wonder if you might be making the problem harder by “simplifying” it. Here’s what a few simple rigid motions look like when projected onto the $y$-$z$ plane:
Of concern, too, is the “free” factor of $x$ in the translation part of all of these transformations. This doesn’t really correspond to anything in the image plane and makes the resulting transformation non-affine. It’s reminiscent of the way that $z$ appears in the derivation of the perspective projection, so I wonder if you might get better results by computing a planar perspective transform for each view and trying to reconstruct the 3-D transformation from those. On the other hand, if the frame-to-frame difference is small, affine approximations shouldn’t be too bad.