[Math] Definition and example for a matrix representing a non linear transformation

geometrymatricesprojective-geometry

I'm studying and trying to grasp the math behind some basic projective geometry, I'm having troubles when things starts to be non-linear .

Model space, world space and eye space are all the products of linear transformations as far as I can tell, I have little to no doubts in asserting that ; the problems start when considering the geometrical frustrum that is representing the perspective projection . Also as far as I can tell the clipping process in NDC coordinates is non linear too, but I have a couple of doubts about this one since I don't really grasp the previous step yet so I can easily be wrong about the frustrum "warping" into the NDC cube .

There is also this really bad habit of not naming things the right way in projective geometry; for example there is this $w$ term in almost all the $4×4$ projective matrices that is usually $=1$, and this term is never defined for what it is or even associated to a real nomenclature / name, I just know that I'm supposed to leave it to $1$ and even changing the term to arbitrary values doesn't really change anything in the final rendered result .

Do non linear transformations can be defined in terms of matrix that are presenting themself in an $X$ generic form ? I have noticed that, at least in the case of projective geometry, there are divisions involving some of the entries in the matrices, mainly the ones on the diagonal, maybe there is a generic form involving algebraic divisions for each term that can express all the non linear transformations ? How do I recognize a non linear transformation by just looking at the algebra / matrix of the problem ?

To simplify further, can you write an example of a matrix that is showing a non linear behaviour and depict the geometry associated with this matrix and how the matrix itself is influencing its shape ?

Best Answer

If you think about points using their normal cartesian coordinates, then applying a projective transformation essentially means performing three steps.

  1. You homogenize the point, by appending a fourth coordinate set to $1$. So $P_1=(x,y,z)$ becomes $P_2=(x,y,z,1)$.
  2. Then you multiply with some $4\times 4$ matrix, e.g. $$A=\begin{pmatrix}a_{11}&a_{12}&a_{13}&a_{14}\\a_{21}&a_{22}&a_{23}&a_{24}\\a_{31}&a_{32}&a_{33}&a_{34}\\a_{41}&a_{42}&a_{43}&a_{44}\end{pmatrix}\qquad P_3=A\cdot P_2=\begin{pmatrix}a_{11}x+a_{12}y+a_{13}z+a_{14}\\a_{21}x+a_{22}y+a_{23}z+a_{24}\\a_{31}x+a_{32}y+a_{33}z+a_{34}\\a_{41}x+a_{42}y+a_{43}z+a_{44}\end{pmatrix}$$
  3. Then you dehomogenize: you divide the coordinate vector by its last (“$w$”) coordinate, and drop that coordinate. You end up with $$P_4=\frac1{a_{41}x+a_{42}y+a_{43}z+a_{44}}\begin{pmatrix}a_{11}x+a_{12}y+a_{13}z+a_{14}\\a_{21}x+a_{22}y+a_{23}z+a_{24}\\a_{31}x+a_{32}y+a_{33}z+a_{34}\end{pmatrix}$$

So although step 2. all by itself might be seen as a linear transformation in a four-dimensional space, the fact that your vectors are actually interpreted as homogeneous coordinates means that it's essentially a probably non-linear projective transformation in a 3-dimensional projective space.

It is affine exactly if $a_{41}=a_{42}=a_{43}=0$ and linear if $a_{41}=a_{42}=a_{43}=a_{14}=a_{24}=a_{34}=0$. Both of these only make sense if $a_{44}\neq0$, in which case you might scale the whole matrix by $1/a_{44}$ to obtain a simpler representation which preserves the $1$ in the last coordinate of each input vector and thus avoids the division in step 3.

Homogeneous coordinates are actually equivalence classes. Scalar multiples of a given vector represent the same point. So if you decide to set $w=2$ in the first step, instead of the conventional $w=1$, then your vector $(x,y,z,2)$ actually describes the same point as the vector $(x/2,y/2,z/2,1)$. So by setting $w$ to a value larger than $1$, you're shrinking you input by that factor $w$. A value smaller than $1$ expands. Points with $w=0$ are somewhat special. Dehomogenizing them would lead to a division by zero. These are points at infinity. They represent directions in space, e.g. they are the points where parallel lines meet.

When dealing with projective geometry, it's usually best to leave the dehomogenization step for the very end of all operations. So you'd take your input, homogenize it, and then perform all subsequent transformations on homogeneous coordinates before dehomogenizing the result at the very end. By avoiding dehomogenization after every step, a point at infinity in some step may still end up in a finite position in your final scene, usually as some point visible on the (2d image of the 3d) horizon, so there are benefits beyond improved performance. Parts of your question sound like you're about to learn OpenGL programming or something like that. If that's the case, keep in mind that many OpenGL operations will perform the dehomogenization step implicitely, so it's perfectly all right to use homogeneous coordinates for the vertices of some polygons, or the coordinates of some texture lookup, or whatever.