The original ellipse in space can be specified parametrically by
$ p(t) = v_0 + v_1 \cos t + v_2 \sin t $
where $v_0, v_1 , v_2 \in \mathbb{R}^3$ are the center, and the two semi-axes, respectively.
In terms of the vector $w = [\cos t, \sin t, 1 ]^T$, we have
$ p(t) = V w $ with $ V = [v_1, v_2, v_0] $
Now the rays from the eyepoint $E$ to $p(t)$ can be expressed parametrically as
$ q(t) = E + s (p(t) - E) = E + s F w $
with $F = [v_1, v_2, v_0 - E] $
From which it follows that
$ q(t) - E = s F w $
i.e.
$w = \dfrac{1}{s} F^{-1} (q(t) - E)$
However $w = [\cos t, \sin t , 1 ]$ satisfies
$w^T Q_0 w = 0 $
where $Q_0 = \text{diag} \{1, 1, -1 \} $
Therefore, the rays $q(t) $ satisfy
$ \dfrac{1}{s^2} (q(t) - E)^T F^{-T} Q_0 F^{-1} (q(t) - E) = 0 $
We can assume that $s \ne 0$. And this the equation of an elliptical cone that is formed by the point $E$ and $p(t)$.
To complete the analysis, I will define the vector
$N = v_0 - E $
to be the vector connecting $E$ to $v_0$, and will take the projection plane
to have a normal vector $N$ and passing through $r_0 = E + \alpha N $ for some $\alpha \gt 0$. Therefore, the equation of the projection plane is
$ N^T (r - r_0) = 0 $
Next, we'll apply an affine transformation to the vector $q$ of the cone as follows
$q' = F^{-1} (q - E) $
It then follows that $q'^T Q_0 q' = 0 $
And this is the equation of a right circular cone with axis along the $z'$ axis with a semi-vertical angle of $45^\circ$. Applying the same transformation to the projection plane, results in
$ r = E + F (r') $
which we substitute into the equation of the plane, to obtain,
$ N^T (E + F r' - r_0) = 0 $
noting that $r_0 - E = \alpha N$, then,
$ N^T F r' = \alpha (N^T N) $
Which can be written as
$ (F^T N)^T (r') = \alpha (N^T N) $
This plane (in the transformed space) is intersecting the right circular cone (also in the same transformed space), and the intersection will be an ellipse if the angle between the normal to the plane (which is given by $F^T N $) and the $z'$ axis is less than $ 45^\circ$. Using the dot product between $[0, 0, 1]^T$ and $(F^T N)$
we get,
$ k \cdot F^T N = [0, 0, 1] F^T N = N^T N = \cos \theta | F^T N | $
But $|F^T N | = \sqrt{ (v_1^T N)^T + (v_2^T N) + (N^T N)^2 }$
Since we want $\theta \lt 45^\circ$ , then we want,
$\dfrac{ N^T N }{\sqrt{(v_1^T N)^2 + (v_2^T N)^2 + (N^T N)^2 } } \gt \dfrac{1}{\sqrt{2}} $
which upon manipulating the denominator becomes:
$\dfrac{ 1 }{\sqrt{ \dfrac{|v_1|^2}{N^T N} \cos^2 \theta_1 + \dfrac{|v_2|^2}{N^T N} \cos^2 \theta_2 + 1 } } \gt \dfrac{1}{\sqrt{2}} $
where $\theta_1$ is the angle between $v_1$ and $N$ and $\theta_2$ is the angle between $v_2$ and $N$
The last inequality is equivalent to
$|v_1|^2 \cos^2 \theta_1 + |v_2|^2 \cos^2 \theta_2 \lt N^T N $
And this can be achieved if the vector $N$ is large enough with respect to $v_1$ and $v_2$.
Assuming condition is satisfied, then the intersection curve will be an ellipse which can be computed, but for our purposes, we can just assume that it is
$q'(t) = q_0 + q_1 \cos t + q_2 \sin t $
For some $q_0, q_1, q_2$
At the final step, we will obtain $q$ from $q'$ using the fact that $ q = E + F q' $
Thus, the intersection curve is
$q(t) = E + F q_0 + F q_1 \cos t + F q_2 \sin t $
which is an ellipse.
You ask questions that relate to some fairly abstract mathematical objects such as the real projective plane. But the topic of the linked presentation
that you referred to
is a much more concrete problem, which is how to represent a three-dimensional shape on a two-dimensional computer screen. So I will address just that topic.
There are two parts to the topic: how realistic a two-dimensional representation appears to us, and how the computer should obtain that representation.
The "how realistic" question has nothing to do with $x,y,z$ coordinates.
Painters were using one-point and even two-point perspective long before
the Cartesian coordinate system took hold.
Projections are essentially geometric constructions that do not need coordinates.
But in a computer, a three-dimensional object is typically described in terms of
$x,y,z$ coordinates,
and the image on the screen is typically described in horizontal and vertical coordinates.
So in the computer we want some way to get the three $x,y,z$ coordinates of the object "projected" onto the two coordinates of a pixel on the screen.
Now I'm going to explain a lot of stuff that you apparently already know.
This may seem redundant, but I need to establish the facts in certain language that I can refer back to later in the answer.
A useful technique to help transform a 3D object onto the computer screen is to suppose there is a "viewing plane" in 3D space onto which we project each point of the object or scene being viewed.
We can also assign two perpendicular lines in that plane to be $x$ and $y$ axes,
thereby assigning $x,y$ coordinates to every point in the plane.
It is then relatively simple to scale and shift the $x$ and $y$ coordinates of each projected point to the horizontal and vertical screen coordinates of the pixel where that point should be displayed.
In the presentation, these last steps (from viewing plane coordinates to screen coordinates) are the "Normalization Transformation and Clipping" step and the "Viewport Transformation" step. But the presentation doesn't say much about these steps; it is more concerned with getting the $x,y$ coordinates on the viewing plane.
In order to make the geometric idea of projection work, the viewing plane has to be a plane in the same three-dimensional space as the object. So we call its $x,y$ coordinates $x_v,y_v$ to distinguish them from the other coordinate systems the computer is keeping track of.
And once you have set up two axes ($x_v$ and $y_v$) in 3D space, they determine a third axis perpendicular to both of them, which we call the $z_v$ axis.
Now I'll try to address some things that you apparently do not know.
Since we really only care about $x_v$ and $y_v$ coordinates for the final display of the picture on the computer screen, we have some freedom about how we deal with the $z_v$ coordinates.
We can put the origin of the $x_v,y_v,z_v$ coordinate system in the viewing plane,
as I did a few paragraphs earlier,
in which case $z_v=0$ everywhere in the viewing plane, or we can move the origin of that system as far as we want in either direction along the $z_v$ axis, which leaves the $x_v$ and $y_v$ coordinates unchanged at any point in the viewing plane but makes the entire viewing plane have a new (constant) $z_v$ coordinate.
In summary, as long as the $x_v$ and $y_v$ axes are perpendicular to each other and are both parallel to the viewing plane, the $z_v$ axis will be perpendicular to the viewing plane, all points in the viewing plane will have the same $z_v$ coordinate, and the $x_v$ and $y_v$ coordinates of any point in the viewing plane will give you good coordinates to pass to the "Normalization Transformation and Clipping" and "Viewport Transformation" steps that finally display points on the screen.
When I say we have "freedom" with respect to the $z_v$ coordinate, I mean it really does not matter what we choose to be the constant $z_v$ coordinate of the viewing plane, provided that for each projection we do, we make a choice and then stick with that choice for that entire projection.
(That means we must consistently use formulas that correctly represent the projection for that choice.)
Then, for the next projection, we can make a different choice.
If you look toward the end of the presentation, you will see two slides showing two "special cases" of how we choose the $z_v$ coordinate of the viewing plane for a perspective projection.
In one case we set up the axes so that $z_v=0$ everywhere in the viewing plane.
Then the reference point where all the projectors (projection lines) converge has to have a non-zero coordinate (because you don't get an image if the reference point is in the viewing plane itself).
In the other case we set up the axes so that the reference point where the projectors converge has $z_v$ coordinate zero, so the viewing plane needs to have a non-zero $z_v$ coordinate, which could be $z_v=1$ or something else.
Remember these facts when considering the following answers to your particular questions.
why $(x,y,z)$ is projected to $(x_p,y_p)$, why not $(x_p,y_p,z_{vp})$
Technically it is equally valid to say that $(x,y,z)$ is projected to $(x_p,y_p,z_{vp})$; but since we only care about the display on the screen,
we just read off the two coordinates $(x_p,y_p)$ and discard the $z$ coordinate.
view plane is placed at position $z_{vp}$, so why $z_{vp}=0$ in $(x_p,y_p,z_{vp})?$
That's an arbitrary choice that was made by the author of the algorithm.
A different value of $z_{vp}$ could also work. The only thing that matters is the geometry of the projection. Just beware that for oblique projections and perspective projections the mathematical formulas that you use will have a dependence on $z_{vp}$, so you need to make sure you use the correct formulas.
And another question in perspective projection why $(x,y,z)$ is projected to $(x_p,y_p,z_{vp})?$
That's a projection to a point expressed in the viewing-plane coordinates.
The whole point of a projection like this is to project onto the viewing plane,
and you want to use $x$ and $y$ axes that nice Cartesian coordinates in that plane for the "Normalization Transformation and Clipping" and "Viewport Transformation" steps.
That means using a coordinate system in which the $z$-coordinate of the plane is constant. But you can (technically) set that constant to whatever you want and still do perspective projection if you're careful about the formulas you use.
My final question is that in perspective projection projectors are perpendicular to the viewing plane?
No, they cannot all be perpendicular to the viewing plane because they all have to converge at a single point. There can be one projector perpendicular to the viewing plane, but every other projector has a different direction in 3D space so it cannot be perpendicular to the same plane.
In summary, in the linked presentation you see some methods for doing orthographic projection, oblique parallel projection, and perspective projection.
That is all these are: just some methods of doing these projections,
not the only methods of doing these projections.
So when you ask why they chose $z_{vp}$ a particular way for a particular projection, the reason is that it was a choice they were allowed to make.
They had other choices available but needed to make a choice so that they could show you the formulas that you would use to do the projection according to that choice.
Best Answer
If I well understand your question, the answer can be done using homogeneous coordinates.
Given a point $P=(a,b)$, his homogeneous coordinates are $P=[a,b,1]^T\equiv [ca,cb,c]^T$ ( see here for a definition).
using this the projection from the origin on the line $x=1$ can be represented by the matrix: $$A= \begin{bmatrix} 1&0&0\\ 0&1&0\\ 1&0&0 \end{bmatrix} $$ that gives: $$ \begin{bmatrix} 1&0&0\\ 0&1&0\\ 1&0&0 \end{bmatrix} \begin{bmatrix} a\\ b\\ 1 \end{bmatrix}= \begin{bmatrix} a\\ b\\ a \end{bmatrix}\equiv \begin{bmatrix} 1\\ b/a\\ 1 \end{bmatrix} $$
For any $P=(a,b)$, the straight line from $O$ to $P$ has equation $y=\frac{b}{a}x$, so the point $P'$ of this line with $x=1$ has coordinates $P'=(1,\frac{b}{a})$
So, in homogeneous coordinates, the two points are represented as: $$ P=\begin{bmatrix} a\\ b\\ 1 \end{bmatrix} \qquad P'=\begin{bmatrix} 1\\ b/a\\ 1 \end{bmatrix}=\begin{bmatrix} a\\ b\\ a \end{bmatrix} $$ and a simple inspection show that the matrix that transforms $P \to P'$ is the matrix $A$