[Math] Problem in Deducing Perspective Projection Matrix

linear algebraprojective-geometry

I understand the traditional way(use similar triangle and make depth value linear) to deduce the perspective projection matrix.
But I want to try another approach after I read this text: Fundamentals of Texture Mapping and Image Warping.

On page 17, it says that a quad can be mapped to a square using projective transformation, which can be expressed as a rational linear mapping:

$$\mathit{x} = \frac{\mathit{a}\mathit{u} + \mathit{b}\mathit{v} + \mathit{c}}
{\mathit{g}\mathit{u} + \mathit{h}\mathit{v} + \mathit{i}}\\
\mathit{y} =
\frac{\mathit{d}\mathit{u} + \mathit{e}\mathit{v} + \mathit{f}}
{\mathit{g}\mathit{u} + \mathit{h}\mathit{v} + \mathit{i}}$$

After I substitute four vertices of the quad and square, I get a linear system. By solving the linear system I can get the projective transformation matrix.

Similarly, I conceive that a 3D projective mapping can be denoted as a rational linear mapping as well. And this rational linear mapping can map a frustum to a NDC cube.(note that because of the use of homogeneous coordinate, the last element of matrix(right bottom one) can be set to 1)

$$
\mathit{x} =
\frac{\mathit{a}\mathit{u} + \mathit{b}\mathit{v} + \mathit{c}\mathit{w} + \mathit{d}}
{\mathit{m}\mathit{u} + \mathit{n}\mathit{v} + \mathit{o}\mathit{w} + 1}\\
\mathit{y} =
\frac{\mathit{e}\mathit{u} + \mathit{f}\mathit{v} + \mathit{g}\mathit{w} + \mathit{h}}
{\mathit{m}\mathit{u} + \mathit{n}\mathit{v} + \mathit{o}\mathit{w} + 1}\\
\mathit{z} =
\frac{\mathit{i}\mathit{u} + \mathit{j}\mathit{v} + \mathit{k}\mathit{w} + \mathit{l}}
{\mathit{m}\mathit{u} + \mathit{n}\mathit{v} + \mathit{o}\mathit{w} + 1}
$$

But when I try to solve this system, the result matrix is not as same as the one in 3D API(like OpenGL) specification.

My question is: is there any extra properties of a perspective projection matrix that a rational linear mapping does not have?

EDIT: I found that in the 2D version we have 8 unknowns(3*3, and one matrix element set to 1 excluded), which equals exactly the number of equations in the linear system(4 vertices in a quad, and 2 corrdinate components in each of them). However in the 3D version, the number of unknowns and the number of equations does not match.
I suspect I misunderstand the rational linear mapping, and the reason why it can applied to a 2D version is just a coincidence.
I will do more learn on this topic.

Best Answer

In $2D$, a perspective map can map any $4$ non-collinear points to any $4$ other points. Therefore, a perspective map has $8$ ($4\times2$) independent elements ($3\times3-1$).

In $3D$, a perspective map can map any $5$ non-coplanar points to any $5$ other points. Therefore, a perspective map has $15$ ($5\times3$) independent elements ($4\times4-1$).

In this answer, I develop the $2D$ matrix to map any $4$ non-collinear points to any $4$ other points. This method can be extended to $3D$.

A $3D$ point can be imbedded in $4D$: $$ [x,y,z]\mapsto[x,y,z,1] $$ and a $4D$ point can be projected to $3D$: $$ [x,y,z,r]\mapsto\left[\frac{x}{r},\frac{y}{r},\frac{z}{r}\right] $$ A perspective mapping imbeds a $3D$ point in $4D$, performs a $4\times4$ matrix multiplication, then projects the $4D$ result back to $3D$: $$ M:\left[\begin{array}{cc}x&y&z\end{array}\right]\mapsto\left[\begin{array}{ccc}x&y&z&1\end{array}\right]M=\left[\begin{array}{ccc}u&v&w&s\end{array}\right]\mapsto\left[\begin{array}{cc}\frac{u}{s}&\frac{v}{s}&\frac{w}{s}\end{array}\right] $$ Given any $5$ points in $\mathbb{R}^3$, $[x_n,y_n,z_n]_{n=1}^5$, no $4$ of which are coplanar, compute $$ \left[\begin{array}{ccc}d_1&d_2&d_3&d_4\end{array}\right]=\left[\begin{array}{ccc}x_5&y_5&z_5&1\end{array}\right]\left[\begin{array}{ccc}x_1&y_1&z_1&1\\x_2&y_2&z_2&1\\x_3&y_3&z_3&1\\x_4&y_4&z_4&1\end{array}\right]^{-1} $$ and define $$ M_{[x\;y\;z]}=\left[\begin{array}{ccc}d_1x_1&d_1y_1&d_1z_1&d_1\\d_2x_2&d_2y_2&d_2z_2&d_2\\d_3x_3&d_3y_3&d_3z_3&d_3\\d_4x_4&d_4y_4&d_4z_4&d_4\end{array}\right] $$ Then for $5$ other points $\mathbb{R}^3$, $[u_n,v_n,w_n]_{n=1}^5$, we have $$ M_{[x\;y\;z]}^{-1}M_{[u\;v\;w]}:\left[\begin{array}{cc}x_n&y_n&z_n\end{array}\right]\mapsto\left[\begin{array}{cc}u_n&v_n&w_n\end{array}\right] $$

Related Solutions

[Math] A controlled trapezoid transformation with perspective projecton

Grabbed a pencil, piece of paper, and Maple and solved it by myself.

var eqRoot:Number = -Math.sqrt(newBottomWidth*newBottomWidth*oldHeight*oldHeight - newHeight*newHeight*oldWidth*oldWidth);
var focalLength:Number = Math.abs(eqRoot/(newBottomWidth - oldWidth));
var angle:Number = Math.atan2(eqRoot/(newBottomWidth*oldHeight), newHeight*oldWidth/(newBottomWidth*oldHeight));
var angleDeg:Number = angle*180/Math.PI;

[Math] Definition and example for a matrix representing a non linear transformation

If you think about points using their normal cartesian coordinates, then applying a projective transformation essentially means performing three steps.

You homogenize the point, by appending a fourth coordinate set to $1$. So $P_1=(x,y,z)$ becomes $P_2=(x,y,z,1)$.
Then you multiply with some $4\times 4$ matrix, e.g. $$A=\begin{pmatrix}a_{11}&a_{12}&a_{13}&a_{14}\\a_{21}&a_{22}&a_{23}&a_{24}\\a_{31}&a_{32}&a_{33}&a_{34}\\a_{41}&a_{42}&a_{43}&a_{44}\end{pmatrix}\qquad P_3=A\cdot P_2=\begin{pmatrix}a_{11}x+a_{12}y+a_{13}z+a_{14}\\a_{21}x+a_{22}y+a_{23}z+a_{24}\\a_{31}x+a_{32}y+a_{33}z+a_{34}\\a_{41}x+a_{42}y+a_{43}z+a_{44}\end{pmatrix}$$
Then you dehomogenize: you divide the coordinate vector by its last (“$w$”) coordinate, and drop that coordinate. You end up with $$P_4=\frac1{a_{41}x+a_{42}y+a_{43}z+a_{44}}\begin{pmatrix}a_{11}x+a_{12}y+a_{13}z+a_{14}\\a_{21}x+a_{22}y+a_{23}z+a_{24}\\a_{31}x+a_{32}y+a_{33}z+a_{34}\end{pmatrix}$$

So although step 2. all by itself might be seen as a linear transformation in a four-dimensional space, the fact that your vectors are actually interpreted as homogeneous coordinates means that it's essentially a probably non-linear projective transformation in a 3-dimensional projective space.

It is affine exactly if $a_{41}=a_{42}=a_{43}=0$ and linear if $a_{41}=a_{42}=a_{43}=a_{14}=a_{24}=a_{34}=0$. Both of these only make sense if $a_{44}\neq0$, in which case you might scale the whole matrix by $1/a_{44}$ to obtain a simpler representation which preserves the $1$ in the last coordinate of each input vector and thus avoids the division in step 3.

Homogeneous coordinates are actually equivalence classes. Scalar multiples of a given vector represent the same point. So if you decide to set $w=2$ in the first step, instead of the conventional $w=1$, then your vector $(x,y,z,2)$ actually describes the same point as the vector $(x/2,y/2,z/2,1)$. So by setting $w$ to a value larger than $1$, you're shrinking you input by that factor $w$. A value smaller than $1$ expands. Points with $w=0$ are somewhat special. Dehomogenizing them would lead to a division by zero. These are points at infinity. They represent directions in space, e.g. they are the points where parallel lines meet.

When dealing with projective geometry, it's usually best to leave the dehomogenization step for the very end of all operations. So you'd take your input, homogenize it, and then perform all subsequent transformations on homogeneous coordinates before dehomogenizing the result at the very end. By avoiding dehomogenization after every step, a point at infinity in some step may still end up in a finite position in your final scene, usually as some point visible on the (2d image of the 3d) horizon, so there are benefits beyond improved performance. Parts of your question sound like you're about to learn OpenGL programming or something like that. If that's the case, keep in mind that many OpenGL operations will perform the dehomogenization step implicitely, so it's perfectly all right to use homogeneous coordinates for the vertices of some polygons, or the coordinates of some texture lookup, or whatever.

Best Answer

Related Solutions

[Math] A controlled trapezoid transformation with perspective projecton

[Math] Definition and example for a matrix representing a non linear transformation

Related Question