[Math] Projecting from one $2D$ plane onto another $2D$ plane

linear algebraprojective-geometrytransformational-geometry

I would like to project from one $2D$ plane onto another. Imagine that I have a picture taken with a camera that was looking onto a plane. Given camera's extrinsic and intrinsic parameters I want to know how the points in the picture map to the points on the pictured plane.

What I know so far is that this is normally achieved using a homography matrix. However, I want to confirm the particular formula for the described projection.

Homography

Let's assume our intrinsic camera matrix is the following:
$$
I = \begin{pmatrix}f & 0 & 0 & 0\\ 0 & f & 0 & 0\\ 0 & 0 & 1 & 0\end{pmatrix}
$$

The entrinsic matrix describes the position $(x_t, y_t, z_t)$ and rotation of the camera w.r.t. the world coordinates, for demonstration purposes let's assume it's only rotated around $x$ axis:
$$
E = \begin{pmatrix}1 & 0 & 0 & x_t\\ 0 & cos\theta & -sin\theta & y_t \\ 0 & sin\theta & cos\theta & z_t \\ 0 & 0 & 0 & 1\end{pmatrix}
$$

So if we want to find the projection of a $3D$ point in the world $(x_w, y_w, z_w)$ onto our camera plane, we can now use the final camera matrix to perform the projection transformation:
$$
\begin{pmatrix}x_c \\ y_c \\ w \end{pmatrix} = I E \begin{pmatrix}x_w \\ y_w \\ z_w \\ 1\end{pmatrix}
$$

And the position on the camera plane (image), currently assuming that image center is at $(0, 0)$, is given by: $x = x_c/w$ and $y = y_c/w$.

Now to get to the original problem and my question ($2D$ to $2D$ plane projection, rather than $3D$ to $2D$ projection) I would do something like the following. First, I only have the location on the image $(x_c, y_c)$ and I want to derive coordinates $(x_w, y_w)$ on a plane in the world. I can rewrite my equation like this:
$$
H = IE
$$
$$
H = \begin{pmatrix}h_{11} & h_{12} & h_{13} & h_{14}\\h_{21} & h_{22} & h_{23} & h_{24}\\h_{31} & h_{32} & h_{33} & h_{34}\end{pmatrix}
$$
$$
\begin{pmatrix}x_c \\ y_c \\ w \end{pmatrix} = \begin{pmatrix}h_{11} & h_{12} & h_{13} & h_{14}\\h_{21} & h_{22} & h_{23} & h_{24}\\h_{31} & h_{32} & h_{33} & h_{34}\end{pmatrix} \begin{pmatrix}x_w \\ y_w \\ 0 \\ 1\end{pmatrix}
$$
$$
\begin{pmatrix}x_c \\ y_c \\ w \end{pmatrix} = \begin{pmatrix}h_{11} & h_{12} & h_{14}\\h_{21} & h_{22} & h_{24}\\h_{31} & h_{32} & h_{34}\end{pmatrix} \begin{pmatrix}x_w \\ y_w \\ 1\end{pmatrix}
$$
$$
H' = \begin{pmatrix}h_{11} & h_{12} & h_{14}\\h_{21} & h_{22} & h_{24}\\h_{31} & h_{32} & h_{34}\end{pmatrix}
$$
$$
H'^{-1} = H'^T
$$
$$
\begin{pmatrix}x_w \\ y_w \\ w\end{pmatrix} = H'^T \begin{pmatrix}x_c \\ y_c \\ 1 \end{pmatrix}
$$

I would then use $x=x_w/w$ and $y=y_w/w$ as coordinates relative to the $2D$ plane in the world. Is that correct or at least going in the right direction?

Side note: this has been briefly touched upon, but without any good mathematical grounding in someone else's practical question https://stackoverflow.com/questions/20445147/transform-image-using-roll-pitch-yaw-angles-image-rectification and I'm interested in the mathematical foundation of a similar problem.

Best Answer

You were doing well up until the line $H'^{-1} = H'^T$. There’s no particular reason to believe that $H$ is orthogonal; quite the opposite, in fact. (I’m going to switch here to the more conventional name $\mathtt P$ for the projection matrix and also make use of the block decomposition $\mathtt P = \left[\mathtt M \mid \mathbf p_4\right]$.) The first two columns of $\mathtt P$, $\mathbf p_1$ and $\mathbf p_2$, are the vanishing points of the world $x$- and $y$ axes. Unless the camera happens to be positioned just right relative to the $x$-$y$ plane, these vectors will not be orthogonal in the image. The upshot is that you have to compute the inverse of $\begin{bmatrix}\mathbf p_1 & \mathbf p_2 & \mathbf p_4\end{bmatrix}$, not its transpose.

This leads to the next potential problem: this matrix might not be invertible. The submatrix $\mathtt M$ is effectively the composition of a rotation and nonsingular affine transformation, so we know the first three columns are linearly independent, but it’s entirely possible that $\mathbf p_4 = \lambda \mathbf p_1 + \mu \mathbf p_2$, that is, that the image of the world origin lies on the vanishing line of the $x$-$y$ plane. This occurs when the camera center lies on this plane. In the specific construction in your question, this isn’t really an issue since you’re reprojecting onto the $x$-$y$ plane, but keep this in mind when generalizing to other planes.

Thus, the reprojection onto the $x$-$y$ plane is given by the homography matrix $$\mathtt H = \begin{bmatrix}\mathbf p_1 & \mathbf p_2 & \mathbf p_4\end{bmatrix}^{-1}.$$ This can be generalized to any plane that doesn’t contain the camera center by inserting an appropriate coordinate transformation $\mathtt B$ into the cascade, i.e., start with $$w\begin{bmatrix}x_c \\ y_c \\ 1 \end{bmatrix} = \mathtt{PB}\begin{bmatrix}x_w\\y_w\\0\\1\end{bmatrix}.$$

For comparison, here’s a construction that makes direct use of back-mapping the image points. Given a plane with homogeneous coordinate vector $\mathbf\Pi$, its intersection with the line through a fixed point $\mathbf C$ not on the plane and an arbitrary point $\mathbf X$ is $\left(\mathbf\Pi^T\mathbf X\right) \mathbf C - \left(\mathbf C^T\mathbf\Pi\right) \mathbf X$. The world coordinates of the camera center can be recovered from $\mathbf P$: its inhomogeneous Cartesian coordinates are $\tilde{\mathbf C}=\mathtt M^{-1}\mathbf p_4$. Finally, an image point $\mathbf x$ back-projects to a ray that intersects the plane at infinity at $\left((\mathtt M^{-1}\mathbf x)^T,0\right)^T$. Putting this all together, and adding a final matrix $\mathtt B$ that imposes a coordinate system on the plane, we get $$\mathtt H = \mathtt B \left(\mathbf C \mathbf \Pi^T-\mathbf C^T \mathbf \Pi \mathtt I_4 \right) \begin{bmatrix}\mathtt M^{-1}\\\mathbf 0^T\end{bmatrix}.$$ With $\mathbf\Pi=(0,0,1,0)^T$—the $x$-$y$ plane—and $$\mathtt B = \begin{bmatrix}1&0&0&0\\0&1&0&0\\0&0&0&1\end{bmatrix}$$ this produces the same homography matrix as above.

The details of the anatomy of the projection matrix that I used above can be found in Hartley and Zisserman’s Multiple View Geometry In Computer Vision and other standard references on the subject.