TL;DR There is no such matrix.
That’s not to say that you couldn’t construct a transformation that maps the two volumes as you’d like, but there’s no projective transformation of space that can accomplish this, so the transformation can’t be implemented as multiplication by a constant $4\times4$ homogeneous matrix.
We can see that this is so because projective transformations map lines to lines and preserve incidence relationships. The extensions of the edges of the destination cube parallel to the $x$-axis all intersect at a single point (at infinity), so the extensions of the corresponding edges of any preimage of the cube must also have a common intersection point.k This is clearly not the case for your trapezoidal prism.
A fairly straightforward algebraic calculation also shows that it’s impossible to construct such a matrix. Observe that the bounding planes of the frustrum can be recovered from the projection matrix $P$. We can represent planes by homogeneous vectors $\mathbf p$, so that the equation of the plane is $\mathbf\pi^T\mathbf x=0$. If we have $\mathbf x'=P\mathbf x$, then $\mathbf\pi^T(P^{-1}\mathbf x') = (P^{-T}\mathbf \pi)^T\mathbf x' = 0$, so the plane $\mathbb\pi$ is mapped to $P^{-T}\mathbf\pi$. In other words, $P^T$ maps planes in the destination space to planes in the source space. Moreover, that source plane is a linear combination of the rows of $P$ (which I’ll denote by $\mathbf P_i$) with coefficients given by the components of the destination plane vector.
Now, the near, far, left, right, top and bottom faces of the destination cube are, respectively, on the planes $(0,0,1,-1)$, $(0,0,1,1)$, $(1,0,0,1)$, $(1,0,0,-1)$, $(0,1,0,-1)$ and $(0,1,0,1)$. So, the corresponding source planes are just the sums and differences of the first three rows of $P$ and its last row. For example, the near plane is $\mathbf P_3-\mathbf P_4$. The actual near plane of the source prism is $(0,0,1,-n)$, so we now have the constraint $\mathbf P_3-\mathbf P_4=c_1(0,0,1,-n)$ for some nonzero $c_1$.† Similarly, identifying the two far planes produces $\mathbf P_3+\mathbf P_4=c_2(0,0,1,-f)$. Subtracting the first from the second yields the expression $\frac12(0,0,c_1-c_2,c_1n-c_2f)$ for the last row $\mathbf P_4$. Doing the same for the other two face pairs produces two other expressions for $\mathbf P_4$. They must all be equal, so we end up with a system of linear equations in the unknown coefficients $c_i$. If you go through the computation, you’ll find that this system has only the trivial solution, but even before you’ve gotten that far, you’ll find that we must have $\mathbf P_4=0$, which doesn’t make for a healthy projection matrix.
† Since we’re working with homogeneous vectors and matrices, we have to be careful about using strict equality in these constraints. Since $\mathbf v$ and $k\mathbf v$ (for $k\ne0$) represent the same point/plane, we have to introduce unknown multipliers in the equations that represent point or plane pair constraints. Strict equality, which is what was used to develop the solution to your related question, restricts us to affine transformations, which obviously won’t work here since the images of parallel lines aren’t parallel. You could try to adapt the 2-D method described in this answer, but since the transformation that you’re looking for isn’t projective, that will fail, too.
You ask questions that relate to some fairly abstract mathematical objects such as the real projective plane. But the topic of the linked presentation
that you referred to
is a much more concrete problem, which is how to represent a three-dimensional shape on a two-dimensional computer screen. So I will address just that topic.
There are two parts to the topic: how realistic a two-dimensional representation appears to us, and how the computer should obtain that representation.
The "how realistic" question has nothing to do with $x,y,z$ coordinates.
Painters were using one-point and even two-point perspective long before
the Cartesian coordinate system took hold.
Projections are essentially geometric constructions that do not need coordinates.
But in a computer, a three-dimensional object is typically described in terms of
$x,y,z$ coordinates,
and the image on the screen is typically described in horizontal and vertical coordinates.
So in the computer we want some way to get the three $x,y,z$ coordinates of the object "projected" onto the two coordinates of a pixel on the screen.
Now I'm going to explain a lot of stuff that you apparently already know.
This may seem redundant, but I need to establish the facts in certain language that I can refer back to later in the answer.
A useful technique to help transform a 3D object onto the computer screen is to suppose there is a "viewing plane" in 3D space onto which we project each point of the object or scene being viewed.
We can also assign two perpendicular lines in that plane to be $x$ and $y$ axes,
thereby assigning $x,y$ coordinates to every point in the plane.
It is then relatively simple to scale and shift the $x$ and $y$ coordinates of each projected point to the horizontal and vertical screen coordinates of the pixel where that point should be displayed.
In the presentation, these last steps (from viewing plane coordinates to screen coordinates) are the "Normalization Transformation and Clipping" step and the "Viewport Transformation" step. But the presentation doesn't say much about these steps; it is more concerned with getting the $x,y$ coordinates on the viewing plane.
In order to make the geometric idea of projection work, the viewing plane has to be a plane in the same three-dimensional space as the object. So we call its $x,y$ coordinates $x_v,y_v$ to distinguish them from the other coordinate systems the computer is keeping track of.
And once you have set up two axes ($x_v$ and $y_v$) in 3D space, they determine a third axis perpendicular to both of them, which we call the $z_v$ axis.
Now I'll try to address some things that you apparently do not know.
Since we really only care about $x_v$ and $y_v$ coordinates for the final display of the picture on the computer screen, we have some freedom about how we deal with the $z_v$ coordinates.
We can put the origin of the $x_v,y_v,z_v$ coordinate system in the viewing plane,
as I did a few paragraphs earlier,
in which case $z_v=0$ everywhere in the viewing plane, or we can move the origin of that system as far as we want in either direction along the $z_v$ axis, which leaves the $x_v$ and $y_v$ coordinates unchanged at any point in the viewing plane but makes the entire viewing plane have a new (constant) $z_v$ coordinate.
In summary, as long as the $x_v$ and $y_v$ axes are perpendicular to each other and are both parallel to the viewing plane, the $z_v$ axis will be perpendicular to the viewing plane, all points in the viewing plane will have the same $z_v$ coordinate, and the $x_v$ and $y_v$ coordinates of any point in the viewing plane will give you good coordinates to pass to the "Normalization Transformation and Clipping" and "Viewport Transformation" steps that finally display points on the screen.
When I say we have "freedom" with respect to the $z_v$ coordinate, I mean it really does not matter what we choose to be the constant $z_v$ coordinate of the viewing plane, provided that for each projection we do, we make a choice and then stick with that choice for that entire projection.
(That means we must consistently use formulas that correctly represent the projection for that choice.)
Then, for the next projection, we can make a different choice.
If you look toward the end of the presentation, you will see two slides showing two "special cases" of how we choose the $z_v$ coordinate of the viewing plane for a perspective projection.
In one case we set up the axes so that $z_v=0$ everywhere in the viewing plane.
Then the reference point where all the projectors (projection lines) converge has to have a non-zero coordinate (because you don't get an image if the reference point is in the viewing plane itself).
In the other case we set up the axes so that the reference point where the projectors converge has $z_v$ coordinate zero, so the viewing plane needs to have a non-zero $z_v$ coordinate, which could be $z_v=1$ or something else.
Remember these facts when considering the following answers to your particular questions.
why $(x,y,z)$ is projected to $(x_p,y_p)$, why not $(x_p,y_p,z_{vp})$
Technically it is equally valid to say that $(x,y,z)$ is projected to $(x_p,y_p,z_{vp})$; but since we only care about the display on the screen,
we just read off the two coordinates $(x_p,y_p)$ and discard the $z$ coordinate.
view plane is placed at position $z_{vp}$, so why $z_{vp}=0$ in $(x_p,y_p,z_{vp})?$
That's an arbitrary choice that was made by the author of the algorithm.
A different value of $z_{vp}$ could also work. The only thing that matters is the geometry of the projection. Just beware that for oblique projections and perspective projections the mathematical formulas that you use will have a dependence on $z_{vp}$, so you need to make sure you use the correct formulas.
And another question in perspective projection why $(x,y,z)$ is projected to $(x_p,y_p,z_{vp})?$
That's a projection to a point expressed in the viewing-plane coordinates.
The whole point of a projection like this is to project onto the viewing plane,
and you want to use $x$ and $y$ axes that nice Cartesian coordinates in that plane for the "Normalization Transformation and Clipping" and "Viewport Transformation" steps.
That means using a coordinate system in which the $z$-coordinate of the plane is constant. But you can (technically) set that constant to whatever you want and still do perspective projection if you're careful about the formulas you use.
My final question is that in perspective projection projectors are perpendicular to the viewing plane?
No, they cannot all be perpendicular to the viewing plane because they all have to converge at a single point. There can be one projector perpendicular to the viewing plane, but every other projector has a different direction in 3D space so it cannot be perpendicular to the same plane.
In summary, in the linked presentation you see some methods for doing orthographic projection, oblique parallel projection, and perspective projection.
That is all these are: just some methods of doing these projections,
not the only methods of doing these projections.
So when you ask why they chose $z_{vp}$ a particular way for a particular projection, the reason is that it was a choice they were allowed to make.
They had other choices available but needed to make a choice so that they could show you the formulas that you would use to do the projection according to that choice.
Best Answer
I was talking about this with some people on Discord and one gave me this answer: (I got permission to post it here)