If All the Points Lie On a Plane, Then Why Does the Linear Mapping Reduce to …

computer visionlinear algebraprojective-geometryprojective-spacetransformation

I previously asked a question with regards to what the matrix $\mathrm{H}_{3 \times 3}$ is/represents in the following textbook excerpt:

In applying projective geometry to the imaging process, it is customary to model the world as a $3$D projective space, equal to $\mathbb{R}^3$ along with points at infinity. Similarly the model for the image is the $2$D projective plane $\mathbb{P}^2$. Central projection is simply a map from $\mathbb{P}^3$ to $\mathbb{P}^2$. If we consider points in $\mathbb{P}^3$ written in terms of homogeneous coordinates $(\mathrm{X}, \mathrm{Y}, \mathrm{Z}, \mathrm{T})^T$ and let the centre of projection be the origin $(0, 0, 0, 1)^T$, then we see that the set of all points $(\mathrm{X}, \mathrm{Y}, \mathrm{Z}, \mathrm{T})^T$ for fixed $\mathrm{X}$, $\mathrm{Y}$, and $\mathrm{Z}$, but varying $\mathrm{T}$ form a single ray passing through the point centre of projection, and hence all mapping to the same point. Thus, the final coordinates of $(\mathrm{X}, \mathrm{Y}, \mathrm{Z}, \mathrm{T})$ is irrelevant to where the point is imaged. In fact, the image point is the point in $\mathbb{P}^2$ with homogeneous coordinates $(\mathrm{X}, \mathrm{Y}, \mathrm{Z})^T$. Thus, the mapping may be represented by a mapping of $3$D homogeneous coordinates, represented by a $3 \times 4$ matrix $\mathrm{P}$ with the block structure $P = [I_{3 \times 3} | \mathbf{0}_3]$, where $I_{3 \times 3}$ is the $3 \times 3$ identity matrix and $\mathbf{0}_3$ a zero 3-vector. Making allowance for a different centre of projection, and a different projective coordinate frame in the image, it turns out that the most general imaging projection is represented by an arbitrary $3 \times 4$ matrix of rank $3$, acting on the homogeneous coordinates of the point in $\mathbb{P}^3$ mapping it to the imaged point in $\mathbb{P}^2$. This matrix $\mathrm{P}$ is known as the camera matrix.

In summary, the action of a projective camera on a point in space may be expressed in terms of a linear mapping of homogeneous coordinates as

$$\begin{bmatrix}
x \\
y \\
w \end{bmatrix} = \mathrm{P}_{3 \times 4}
\begin{bmatrix}
\mathrm{X} \\
\mathrm{Y} \\
\mathrm{Z} \\
\mathrm{T} \\ \end{bmatrix}$$

Furthermore, if all the points lie on a plane (we may choose this as the plane $\mathrm{Z} = 0$) then the linear mapping reduces to

$$\begin{bmatrix}
x \\
y \\
w \end{bmatrix} = \mathrm{H}_{3 \times 3}
\begin{bmatrix}
\mathrm{X} \\
\mathrm{Y} \\
\mathrm{T} \\ \end{bmatrix}$$

which is a projective transformation.

It is now clear to me that I didn't understand this section properly. Specifically, it is not clear to me why choosing the plane $\mathrm{Z} = 0$ means that the linear mapping

$$\begin{bmatrix}
x \\
y \\
w \end{bmatrix} = \mathrm{P}_{3 \times 4}
\begin{bmatrix}
\mathrm{X} \\
\mathrm{Y} \\
\mathrm{Z} \\
\mathrm{T} \\ \end{bmatrix}$$

reduces to

$$\begin{bmatrix}
x \\
y \\
w \end{bmatrix} = \mathrm{H}_{3 \times 3}
\begin{bmatrix}
\mathrm{X} \\
\mathrm{Y} \\
\mathrm{T} \\ \end{bmatrix}$$

More specifically, I don't understand why setting $\mathrm{Z} = 0$ necessitates getting rid of the entirety of the 3rd column of $\mathrm{P}_{3 \times 4}$, which gets us $\mathrm{H}_{3 \times 3}$ (as per bounceback's answer in the aforementioned question)?

And I'm also wondering whether choosing the plane $\mathrm{Z} = 5$ (or any other plane) instead of $\mathrm{Z} = 0$ would still reduce the transformation from

$$\begin{bmatrix} x \\ y \\ w \end{bmatrix} = \mathrm{P}_{3 \times 4} \begin{bmatrix} \mathrm{X} \\ \mathrm{Y} \\ 0 \\ \mathrm{T} \\ \end{bmatrix}$$

to

$$\begin{bmatrix} x \\ y \\ w \end{bmatrix} = \mathrm{H}_{3 \times 3} \begin{bmatrix} \mathrm{X} \\ \mathrm{Y} \\ \mathrm{T} \\ \end{bmatrix},$$

where

$$\mathrm{H}_{3 \times 3} = \begin{bmatrix}1&0&0\\0&1&0\\0&0&0\end{bmatrix}$$

?

I would greatly appreciate it if people could please take the time to clarify this.

Best Answer

Lets write $P_{3\times4}=\begin{bmatrix}r_{11}&r_{12}&r_{13}&\tau_1\\ r_{21}&r_{22}&r_{23}&\tau_2\\r_{31}&r_{32}&r_{33}&\tau_3\\\end{bmatrix}$ and call the $r$ terms $R_{3\times3}$ and the $\tau$ terms $\tau_{3\times1}$ thus $P_{3\times 4}=[R,\tau]$ (I dropped the dimension indices for brevity, but $\tau$ is a column vector). Now $P_{3\times 4} U_4$ becomes $R \begin{bmatrix} X\\Y\\Z \end{bmatrix} +T\tau=R U_3 +T\tau$. where I defined $U_3 =\begin{bmatrix} X\\Y\\Z \end{bmatrix}$.

Before dealing with the $Z=0$ case, lets look at the case where the plane does not pass through the origin: all points $U_3$ are on a plane, hence they obey $N^T U_3 =d$ where $N$ is a unit vector normal to the plane and $-d\neq0$ is the distance of the origin from the plane. For this case we just have $1= {1\over d} N^T U_3$ and therefor $$R U_3 +T\tau= R U_3 +(T\tau){1\over d}(N^T U_3)$$ so that finally $P_{3\times 4} U_4$ can be written as $[R +{T\over d}(\tau N^T)]U_3$ and $H_{3\times3}=[R+ {T\over d}(\tau N^T)]$ is called a "Homography".

The degenerate case is where $d=0$ and which also contains the case $Z=0$ by setting $N^T=[0,0,1]$. In this case instead of pushing $1= {1\over d} N^T U_3$ we push $0= N^T U_3$ . So $$R U_3 +T\tau = (R (I-N N^T) +\tau N^T )(U_3+T N)$$ , In the specific case $N^T=[0,0,1]$ we have $\left[R \begin{bmatrix} 1&0&0\\0&1&0\\0&0&0\end{bmatrix} +\begin{bmatrix} 0&0&\tau_x\\0&0&\tau_y\\0&0&\tau_z\end{bmatrix}\right]\begin{bmatrix} X\\Y\\T\end{bmatrix}$

Related Question