Projective Transformations: “If all the points lie on a plane, then the linear mapping reduces to …”

computer visiongeometryprojective-geometryprojective-spacetransformation

Page 7 of my computer vision textbook, Multiple View Geometry in Computer Vision, says the following:

In applying projective geometry to the imaging process, it is customary to model the world as a $3$D projective space, equal to $\mathbb{R}^3$ along with points at infinity. Similarly the model for the image is the $2$D projective plane $\mathbb{P}^2$. Central projection is simply a map from $\mathbb{P}^3$ to $\mathbb{P}^2$. If we consider points in $\mathbb{P}^3$ written in terms of homogeneous coordinates $(\mathrm{X}, \mathrm{Y}, \mathrm{Z}, \mathrm{T})^T$ and let the centre of projection be the origin $(0, 0, 0, 1)^T$, then we see that the set of all points $(\mathrm{X}, \mathrm{Y}, \mathrm{Z}, \mathrm{T})^T$ for fixed $\mathrm{X}$, $\mathrm{Y}$, and $\mathrm{Z}$, but varying $\mathrm{T}$ form a single ray passing through the point centre of projection, and hence all mapping to the same point. Thus, the final coordinates of $(\mathrm{X}, \mathrm{Y}, \mathrm{Z}, \mathrm{T})$ is irrelevant to where the point is imaged. In fact, the image point is the point in $\mathbb{P}^2$ with homogeneous coordinates $(\mathrm{X}, \mathrm{Y}, \mathrm{Z})^T$. Thus, the mapping may be represented by a mapping of $3$D homogeneous coordinates, represented by a $3 \times 4$ matrix $\mathrm{P}$ with the block structure $P = [I_{3 \times 3} | \mathbf{0}_3]$, where $I_{3 \times 3}$ is the $3 \times 3$ identity matrix and $\mathbf{0}_3$ a zero 3-vector. Making allowance for a different centre of projection, and a different projective coordinate frame in the image, it turns out that the most general imaging projection is represented by an arbitrary $3 \times 4$ matrix of rank $3$, acting on the homogeneous coordinates of the point in $\mathbb{P}^3$ mapping it to the imaged point in $\mathbb{P}^2$. This matrix $\mathrm{P}$ is known as the camera matrix.

In summary, the action of a projective camera on a point in space may be expressed in terms of a linear mapping of homogeneous coordinates as

$$\begin{bmatrix}
x \\
y \\
w \end{bmatrix} = \mathrm{P}_{3 \times 4}
\begin{bmatrix}
\mathrm{X} \\
\mathrm{Y} \\
\mathrm{Z} \\
\mathrm{T} \\ \end{bmatrix}$$

Furthermore, if all the points lie on a plane (we may choose this as the plane $\mathrm{Z} = 0$) then the linear mapping reduces to

$$\begin{bmatrix}
x \\
y \\
w \end{bmatrix} = \mathrm{H}_{3 \times 3}
\begin{bmatrix}
\mathrm{X} \\
\mathrm{Y} \\
\mathrm{T} \\ \end{bmatrix}$$

which is a projective transformation.

The aforementioned section of the textbook is available freely here.

My questions are as follows:

  1. Where it says

Thus, the final coordinates of $(\mathrm{X}, \mathrm{Y}, \mathrm{Z}, \mathrm{T})$ is irrelevant to where the point is imaged.

shouldn't the vector be $(\mathrm{X}, \mathrm{Y}, \mathrm{Z}, \mathrm{T})^T$?

  1. What is $\mathrm{H}_{3 \times 3}$ supposed to be?

I would greatly appreciate it if people could please take the time to clarify these.

Best Answer

1) It really doesn't make a difference whether you think of it as a row or column vector. I suppose if you're being consistent, yes, it's still the column vector $(X,Y,Z,T)^\top$, but clearly the final component of $(X,Y,Z,T)^\top$ is the same as the final coordinate of $(X,Y,Z,T)$.

2) "the most general imaging projection is represented by an arbitrary 3×4 matrix of rank 3, acting on the homogeneous coordinates of the point in $\mathbb{P}^3$ mapping it to the imaged point in $\mathbb{P}^2$. This matrix $P$ is known as the camera matrix."

So if $P$ is arbitrary of rank 3, and we are imposing the linear condition $Z = 0$, $P$ will reduce to a matrix $H$ that is simply $P$ with the 3rd column removed. This may have rank 2 or 3. Try it out for yourself, constructing examples of matrices $P$.

Related Question