What Vertex Shader Perspective Projection Transformations Preserve Collinearity

geometrylinear algebralinear-transformationsmatricespartial derivative

Hello people of Math StackExchange! I have a question for you. I've recently been reading a lot about OpenGL and Vulkan. I've learned that a perspective projection matrix is typically used to transform camera-space 4-d homogenous coordinates to clip-space 4-d homogeneous coordinates, which are then converted by the graphics pipeline to 3-d normalised-device-coordinates (NDC-space) by dividing the x,y, and z coordinates by the w coordinate. This page provides an example of what such a transformation matrix looks like: https://gamedev.stackexchange.com/questions/120338/what-does-a-perspective-projection-matrix-look-like-in-opengl.

Ignoring all the constants in such a perspective projection matrix, and ignoring the w coordinate, the effect of such a transformation can be understood as a map from $\mathbb R^3$ to $\mathbb R^3$ that looks something like
$$\begin{bmatrix}x\\y\\z\end{bmatrix} \mapsto \begin{bmatrix}\frac{c_0x}z\\\frac{c_1y}z\\\frac{c_2z + c_3}z\end{bmatrix}$$

Where $c_i$ are constants determined by the field of view, the aspect ratio of the screen, and the values of the $z_{\text{near}}$ and $z_{\text{far}}$ clipping planes. Geometrically, it makes sense that the $x$ and $y$ coordinates must be divided by $z$ in order for the projection work — this is the so-called "perspective division". What's not so obvious is why the $z$ coordinate is also mapped the same way, to an expression of the form $c_2 + \frac{c_3}z$.

The purpose that depth information serves in a graphics pipeline is that it allows us to determine which primitive shape is closest to the viewer, at a given pixel location on the screen. However, it seems like the only real reason why it is necessary to map the $z$-coordinate to NDC-space through some expression of the form $a/z+b$ is that the rasterisation stage of the graphics pipeline interpolates depth values linearly in NDC-space (this is discussed in the answers to this question: https://computergraphics.stackexchange.com/questions/8017/why-do-gpus-divide-clip-space-z-by-w-for-position). It's possible to change this interpolation by explicitly writing to some gl_FragDepth variable from a fragment shader in GLSL from what I've read, but this doesn't relate to my question.

Because of this linear interpolation of depth values performed by rasterisation, it is desirable for the transformation performed by the vertex shader to preserve collinearity. For example, suppose we have the points $A$, $B$, and $C$ lying on the same line in world-space, and that the results of running the vertex shader on these points are $A'$, $B'$, and $C'$. If the rasteriser is interpolating between points $A'$ and $C'$ obtained from the vertex shader, we would like that the value it determines for the depth at $x=B'_x$ and $y=B'_y$ to be the same as the value $B'_z$ that we would have gotten if we were instead interpolating between $A$ and $C$ in world-space and passed the point $B$ through the vertex shader to get an NDC-space depth.

Motivation aside, what I am asking is…

Suppose we are projecting points from camera-space to NDC-space using the transformation
$$T = \begin{bmatrix}x\\y\\z\end{bmatrix} \mapsto \begin{bmatrix}\frac{c_0x}z\\\frac{c_1y}z\\f(x,y,z)\end{bmatrix}$$
What is the most general form of a function $f$ such that the above transformation preserves collinearity, and how can we prove this?

This isn't a question about the computability or usefulness of such a function $f$. Playing around in Desmos, I think a function like
$$f(x,y,z) = \frac{c_2 + c_3x + c_4y + c_5z}z$$ might be the most general, where $c_i$ are all arbitrary constants. I have no idea how to make sure of this though.

I'm a second-year university student. I've only finished Calculus 2 and Linear Algebra 1, so don't go crazy with the higher-level math please, unless it's necessary.

I've made a couple of attempts at approaching this problem, though I haven't really gotten anywhere, because I'm finding the notion of "collinearity preservation" kind of hard to formalise and work with. One thing I tried was to let $p$ and $q$ be 2 points in $\mathbb R^3$, and pick another point $r$ on the same line as these points. Then I map the $x$ and $y$ coordinates using the transformation above to get
$$p_x' = \frac{c_0p_x}{p_z}$$
$$q_x' = \frac{c_0q_x}{q_z}$$
$$r_x' = \frac{c_0r_x}{r_z}$$
and similar equations for the $y$ coordinates. Because $p$, $q$, and $r$ are collinear, so should $p'$, $q'$, and $r'$ be, assuming $T$ preserves collinearity. We can express these collinearities with the equations
$$\frac{p_z – r_z}{p_z – q_z} = \frac{p_x – r_x}{p_x – q_x}$$
$$\frac{p'_z – r'_z}{p'_z – q'_z} = \frac{p'_x – r'_x}{p'_x – q'_x}$$
(and similar equations relating the $y$-coordinates to the $z$-coordinates) but then I'm stuck. I don't know what to solve for in order to get information about $f$. If I try to solve for $p_z'$ in terms of just $p_x$, $p_y$, and $p_z$, I can't get rid of the terms $q_z'$ and $r_z'$, because I can't assume anything about the function $f$.

The other approach I've tried is to consider a line parameterised as a function $g:\mathbb R\to\mathbb R^3$, where
$$g_x'(t) = k_0 g_z'(t)$$
$$g_y'(t) = k_1 g_z'(t)$$
assuming $T$ preserves collinearity, the image of the line $\{g(t) : t\in\mathbb R\}$ in $T$ should also be a line. So if we let $h = T \circ f$, we should get similar relationships between the derivatives of $h_x$, $h_y$, and $h_z$:
\begin{equation}h_x'(t) = k_2 h_z'(t)\tag{1}\end{equation}
$$h_y'(t) = k_3 h_z'(t)$$
(where $k_i$ are all constants) By applying the quotient rule, it's possible to evaluate equation (1) here:
\begin{aligned}
k_2h_z'(t) ={}& h_x'(t)\\
={}& \frac{\partial}{\partial t} \frac{c_0 g_x(t)}{g_z(t)} \\
={}& c_0 \frac{g_x'(t)\cdot g_z(t) – g_x(t)\cdot g_z'(t)}{g_z^2(t)}\\
={}& c_0g_z'(t)\frac{k_0g_z(t) – g_x(t)}{g_z^2(t)}\\
\end{aligned}

But now I'm stuck, and again I don't really know what I'm solving for, because I don't have much practice solving differential equations, let alone systems of differential equations.

Maybe the answer is obvious and just involves simple algebra that I'm just not seeing, or maybe it's actually necessary to solve some complicated system of differential equations to get a general form for the function $f$. What do you people think?

Lastly, so that this question doesn't get closed as a duplicate, here are some similar but different questions to this one:

https://stackoverflow.com/questions/25584667/why-do-i-divide-z-by-w-in-a-perspective-projection-in-opengl (this page doesn't talk about the depth interpolation performed by rasterisation)
https://gamedev.stackexchange.com/questions/197920/in-opengl-why-do-people-worry-that-the-accuracy-of-the-depth-buffer-gets-worse (same deal)
https://stackoverflow.com/questions/17269686/why-do-we-need-perspective-division (same deal)
https://computergraphics.stackexchange.com/questions/8017/why-do-gpus-divide-clip-space-z-by-w-for-position (this one mentions depth interpolation, and essentially says mapping world-space $z$ to NDC-space $z$ through a transformation of the form $z \mapsto \frac az+b$ will make depths interpolate properly, but doesn't explain why)
https://stackoverflow.com/questions/28278309/why-z-is-affected-by-the-perscpective-division (same deal here)

Thanks!

Best Answer

https://en.wikipedia.org/wiki/Collineation#Fundamental_theorem_of_projective_geometry states that over the real numbers every collineation is a projective linear transformation. So you can always express these as a matrix operating on a homogeneous coordinate vector.

In your case that matrix would be

$$\begin{bmatrix} c_0 & 0 & 0 & 0 \\ 0 & c_1 & 0 & 0 \\ c_3 & c_4 & c_5 & c_2 \\ 0 & 0 & 1 & 0 \end{bmatrix} \cdot\begin{bmatrix} x \\ y \\ z \\ 1 \end{bmatrix}=z \begin{bmatrix} \frac{c_0x}z \\ \frac{c_1y}z \\ \frac{c_3x+c_4y+c_5z+c_2}z \\ 1 \end{bmatrix}$$

As long as you take all the other parts of the matrix as given, your four coefficients in the third row are indeed the most general you can have.

The theorem I cited in my first paragraph holds over the real numbers but becomes more complicated over there complex numbers, there an automorphism (complex conjugation in that case) makes things more tricky. A proof needs to take that into account, so it's not quite straight-forward.

Best Answer

Related Solutions

[Math] Problem in Deducing Perspective Projection Matrix

[Math] Definition and example for a matrix representing a non linear transformation

Related Question