The first isn't even approximately right: looking at a reflecting sphere will cram all directions into a finite circle in your field of view, and compresses things far from the axis of vision, whereas stereographic projection stretches out to infinity and stretches areas far from the axis.
The second suggestion is closer, but still not entirely right.
An exact analogy would be something like: First imagine taking a picture that projects onto a sphere, like a 360×180 degree panorama, or a full-surround Imax theater. This spherical image is designed to look natural when observed from the center of the sphere. Now assume that without changing the spherical picture we place ourselves right at the edge of the sphere, then point a camera with ordinary perspective optics back towards the center and take a picture. The result will be in stereographic projection.
This description, however, obscures the most distinctive feature of the stereographic projection, which is that it is conformal, i.e., it preserves angles. If we have a small area anywhere on a stereographic image, it will be exactly similar (except for scale) to the picture we could take with a high-zoom ordinary lens pointing in the appropriate direction. In contrast, ordinary perspective distorts shapes far from the axis of projection.
You ask questions that relate to some fairly abstract mathematical objects such as the real projective plane. But the topic of the linked presentation
that you referred to
is a much more concrete problem, which is how to represent a three-dimensional shape on a two-dimensional computer screen. So I will address just that topic.
There are two parts to the topic: how realistic a two-dimensional representation appears to us, and how the computer should obtain that representation.
The "how realistic" question has nothing to do with $x,y,z$ coordinates.
Painters were using one-point and even two-point perspective long before
the Cartesian coordinate system took hold.
Projections are essentially geometric constructions that do not need coordinates.
But in a computer, a three-dimensional object is typically described in terms of
$x,y,z$ coordinates,
and the image on the screen is typically described in horizontal and vertical coordinates.
So in the computer we want some way to get the three $x,y,z$ coordinates of the object "projected" onto the two coordinates of a pixel on the screen.
Now I'm going to explain a lot of stuff that you apparently already know.
This may seem redundant, but I need to establish the facts in certain language that I can refer back to later in the answer.
A useful technique to help transform a 3D object onto the computer screen is to suppose there is a "viewing plane" in 3D space onto which we project each point of the object or scene being viewed.
We can also assign two perpendicular lines in that plane to be $x$ and $y$ axes,
thereby assigning $x,y$ coordinates to every point in the plane.
It is then relatively simple to scale and shift the $x$ and $y$ coordinates of each projected point to the horizontal and vertical screen coordinates of the pixel where that point should be displayed.
In the presentation, these last steps (from viewing plane coordinates to screen coordinates) are the "Normalization Transformation and Clipping" step and the "Viewport Transformation" step. But the presentation doesn't say much about these steps; it is more concerned with getting the $x,y$ coordinates on the viewing plane.
In order to make the geometric idea of projection work, the viewing plane has to be a plane in the same three-dimensional space as the object. So we call its $x,y$ coordinates $x_v,y_v$ to distinguish them from the other coordinate systems the computer is keeping track of.
And once you have set up two axes ($x_v$ and $y_v$) in 3D space, they determine a third axis perpendicular to both of them, which we call the $z_v$ axis.
Now I'll try to address some things that you apparently do not know.
Since we really only care about $x_v$ and $y_v$ coordinates for the final display of the picture on the computer screen, we have some freedom about how we deal with the $z_v$ coordinates.
We can put the origin of the $x_v,y_v,z_v$ coordinate system in the viewing plane,
as I did a few paragraphs earlier,
in which case $z_v=0$ everywhere in the viewing plane, or we can move the origin of that system as far as we want in either direction along the $z_v$ axis, which leaves the $x_v$ and $y_v$ coordinates unchanged at any point in the viewing plane but makes the entire viewing plane have a new (constant) $z_v$ coordinate.
In summary, as long as the $x_v$ and $y_v$ axes are perpendicular to each other and are both parallel to the viewing plane, the $z_v$ axis will be perpendicular to the viewing plane, all points in the viewing plane will have the same $z_v$ coordinate, and the $x_v$ and $y_v$ coordinates of any point in the viewing plane will give you good coordinates to pass to the "Normalization Transformation and Clipping" and "Viewport Transformation" steps that finally display points on the screen.
When I say we have "freedom" with respect to the $z_v$ coordinate, I mean it really does not matter what we choose to be the constant $z_v$ coordinate of the viewing plane, provided that for each projection we do, we make a choice and then stick with that choice for that entire projection.
(That means we must consistently use formulas that correctly represent the projection for that choice.)
Then, for the next projection, we can make a different choice.
If you look toward the end of the presentation, you will see two slides showing two "special cases" of how we choose the $z_v$ coordinate of the viewing plane for a perspective projection.
In one case we set up the axes so that $z_v=0$ everywhere in the viewing plane.
Then the reference point where all the projectors (projection lines) converge has to have a non-zero coordinate (because you don't get an image if the reference point is in the viewing plane itself).
In the other case we set up the axes so that the reference point where the projectors converge has $z_v$ coordinate zero, so the viewing plane needs to have a non-zero $z_v$ coordinate, which could be $z_v=1$ or something else.
Remember these facts when considering the following answers to your particular questions.
why $(x,y,z)$ is projected to $(x_p,y_p)$, why not $(x_p,y_p,z_{vp})$
Technically it is equally valid to say that $(x,y,z)$ is projected to $(x_p,y_p,z_{vp})$; but since we only care about the display on the screen,
we just read off the two coordinates $(x_p,y_p)$ and discard the $z$ coordinate.
view plane is placed at position $z_{vp}$, so why $z_{vp}=0$ in $(x_p,y_p,z_{vp})?$
That's an arbitrary choice that was made by the author of the algorithm.
A different value of $z_{vp}$ could also work. The only thing that matters is the geometry of the projection. Just beware that for oblique projections and perspective projections the mathematical formulas that you use will have a dependence on $z_{vp}$, so you need to make sure you use the correct formulas.
And another question in perspective projection why $(x,y,z)$ is projected to $(x_p,y_p,z_{vp})?$
That's a projection to a point expressed in the viewing-plane coordinates.
The whole point of a projection like this is to project onto the viewing plane,
and you want to use $x$ and $y$ axes that nice Cartesian coordinates in that plane for the "Normalization Transformation and Clipping" and "Viewport Transformation" steps.
That means using a coordinate system in which the $z$-coordinate of the plane is constant. But you can (technically) set that constant to whatever you want and still do perspective projection if you're careful about the formulas you use.
My final question is that in perspective projection projectors are perpendicular to the viewing plane?
No, they cannot all be perpendicular to the viewing plane because they all have to converge at a single point. There can be one projector perpendicular to the viewing plane, but every other projector has a different direction in 3D space so it cannot be perpendicular to the same plane.
In summary, in the linked presentation you see some methods for doing orthographic projection, oblique parallel projection, and perspective projection.
That is all these are: just some methods of doing these projections,
not the only methods of doing these projections.
So when you ask why they chose $z_{vp}$ a particular way for a particular projection, the reason is that it was a choice they were allowed to make.
They had other choices available but needed to make a choice so that they could show you the formulas that you would use to do the projection according to that choice.
Best Answer
We will be looking at these slides: Computer Graphics Projections (Viewing Transformations).
My first doubt is what the heck is a COP. Aside from a member of the police force.
So I'll begin one slide before.
View Confusion
On the slide number 12, it says:
So we are talking of a perspective projection, through a point. That is we have a situation that could be like this:
Which is simular to the picture on the slide about foreshortening (slide 10).
And then I add a new object, on the other side of the COP. And when we project it to the projection plane, we end up with an image that is upside down.
By the way, this situation where we end up with an upside-down image is close to what we would find in a real-life pinhole camera or camera obscura:
Topological distortion
Now that I know what they mean by COP…
On the slide number 13, it says:
First of all, I'm assuming that it is "Consider all the points on a plane".
However, this makes no sense. There is no concept of points parallel to a plane. I tried assuming they mean "points on a line parallel to the view plane" or "points on a plane parallel to the view plane", but it is not working for me.
So… Let us see the diagram instead.
So we have three points P1, P2, and P3. Which form a line, and the point P3 exists on the plane that is parallel to the view plane and contains COP. Let us try that…
So we can project the points P1 and P2, but not P3, which is between them. One would assume that the projection of P3 would be between the projection of P1 and the projection of P2, but it isn't.
Instead, see what happens if we try to project more points of the line approaching P3:
2D Animation:
3D Animation:
As the points being projected approach P3, their projections approach infinity to opposite sides Which is what I presume they mean in the slide by "these points are projected to a broken line of infinite degree".
"broken line of infinite extent" means what it says: a line, that is broken (it has a gap), that extends to infinity.
Observe that the object is a connected set of points (a line in this case), but the image isn't(there is a gap). Also the image also stretches to infinity. As far as I can tell, when we are talking about perspective projections, "topological distortion" means exactly that. However, I believe the term would have a broader meaning in other contexts… But I'm not familiar with it.
Addendum
Text from the animation:
Note: since for the point P3 the projector line does not intersect the view plane, P3 does not have an image.
If you pick any point you want between P1 and P2 and draw a line that crosses that point and crosses COP. Then the intersection between that line and the view plane is the image of the point you picked. Except, of course, if you picked P3, the line would not intersect the view plane.
The points between P1' (the image of P1) and P2' (the image of P2) exist on the view plane. But none of the points from the segment between P1 and P2 are projected between P1' (the image of P1) and P2' (the image of P2).
That is, the points between P1 and P2 are not projected inside the segment between P1' (the image of P1) and P2' (the image of P2), instead they are projected outside. They are projected in the view plane, but outside the segment between P1' and P2'.
Demo:
Shadertoy: No nearclip.
YouTube: No nearclip (It is a recording from Shadertoy).
Animation (A not well cut loop of the recording on YouTube, heavily compressed and downsized to fit this site maximum file size to embed here):
On the demo, we are rendering the object both when it is in front and when it is behind the camera. The projected object is always to the left of the camera. And the camera is moving back and forth, and also up and down. That is all the motion. And because of that motion sometimes the object is behind the camera, sometimes the object is in front of the camera, and sometimes it is partially in front and behind the camera. Observe that the image of the object that is behind the camera appears on the right side of the frame, inverted. This shows the topological distortion of perspective projection.
The demo is a ray caster. As such it does not use the classic graphic pipeline. It works by intersecting rays (lines, actually) with planes (then clipping by the projected coordinates on the plane, so the planes don't appear infinite). The intersection of the ray and plane could be behind the camera, and the demo render those anyway.
Which are the distortions?
Of course, the image of the point P3 is gone. This is explained above.
The points P1 and P2 are the outer points of the segment. That is, the segment exists inside the space between P1 and P2. But the points P1' (the image of P1) and P2' (the image of P2) are the inner points of the image. That is, the image exists outside the space between P1' and P2'. This is shown in the animations above.
We can also see that image extends outwards from P1' and P2' in this labelled capture from the linked Shadertoy demo:
Which does not happen when the object does not cross the plane of the observer:
The segment that goes from P1 to P2 is continuous (has no gaps). Its image is not continuous, it has a gap (none of the points from the segment between P1 and P2 are projected between P1' and P2'). You can see in the above capture that the image becomes two regions of pixels.
We can also observe that the image of the object behind the observer appears inverted (which is what is described in View Confusion).
On the linked demo we can observe a 180º rotation. We can see this if we compare a capture where the object does not cross the plane of the observer such as shown before. Here is a similar capture using a different texture to make it easier to see the orientation:
And here I have copied the red face over, rotated it 180º, and put it next to the image of the part of the object that is behind the observer, and we see the orientation after the 180º rotation matches:
Note: In this picture P2' is out of the frame.
The texture used here is a photo of Piccadilly Circus, which is available on Shadertoy.
The points very close to P3 on the segment have images on the view plane that are very far away from each other. But points very close to P1 on the segment have images on the view plane that are close together. Similarly the points that are close to P2 on the segment have images that are close together on the view plane.
This is what happens if I project two points that are nearby P1:
As you can see, I picked two points P1a and P1b that are nearby P1, and we see their images P1a' and P1b' are also near each other.
If I pick two points P3a and P3b nearby P3, on the same side, and about the same distance apart from each other as P1a and P1b… First of all, I need to zoom out a lot to the point we cannot tell P3a and P3b apart. But more importantly their images are not so close to each other:
We can also see this distortion here:
In these images I highlighted a segment and its image in blue, and another segment and its image in green. Notice that the blue segment is longer than the green segment (it is twice as long), but the image of the blue segment is shorter than the image of the green segment. This is because the green segment is closer to P3.
The segment has finite length, but its image has infinite lengths. As you can imagine, the closer we pick a point to P3 on the segment, the further away its image is on the view plane, and there is no bound to how far the image can get.