[Math] Unprojecting a 2D point to 3D space on a plane with perspective.

3dgeometry

I'm not a good mathematician, but I'm trying to unproject 2D screen coordinates to a plane in a 3D space with perspective.

I first do an uniform scaling on the 3D scene. Then I rotate around X axis, the plane defined by X axis and Z axis. Then I translate the scene on Z axis, with a value of -1.0.

So I have a ProjectionView matrix, computed by the following way, using the library linmath.h, in language C:

mat4x4 ProjectionView, matrix;

mat4x4 projection;
mat4x4_identity(projection);
mat4x4_perspective(projection, fova, width/height, zNear, zFar);

mat4x4 view;
mat4x4_identity(view);
mat4x4_translate(view, 0.0, 0.0, -1.0);
mat4x4_rotate(matrix, view, 1.0, 0.0, 0.0, theta);
mat4x4_scale_aniso(view, matrix, scale, scale, scale);

mat4x4_mul(ProjectionView, projection, view);

Here is the graphic I use to try to unproject:

enter image description here

The distance Oz is what I can compute using trigonometry, it is on the plane where are the unprojected points, but it doesn't take in consideration the effects of perspective.

So I would like to know how to correct this Oz distance, using the perspective parameters, in order to have the unprojected z coordinate ?

And then how to compute x coordinate, from the unprojected z and from the perspective parameters ?

After some readings on internet, I tried using the inverse ProjectionView matrix, or with the inverse Projection matrix, without a good result, maybe due to the fact I don't know the z value to give to these matrices. So I wonder if there is a way to solve this problem without using an inverse matrix ?

The computed Oz distance is close to the true result, I think it just need a correction due to the perspective.

Best Answer

Your ProjectionView matrix $P$ maps the visible part of the 3D world (bounded by the view frustum and the near and far planes) to normalized device coordinates, a $2\times 2\times 2$ cube with the computer's screen (the near plane) at the $z=-1$ face.

So your first task is to convert the screen coordinates of the "clicked point" $(p_x, p_y)$ to normalized device coordinates. If your viewport has width $w$ and height $h$, this is $$n_x = \frac{2p_x}{w} -1, \qquad n_y = 1- \frac{2p_y}{h}, \qquad n_z = -1.$$

Now the perspective projection happens in homogeneous coordinates, so that if $\mathbf{q} = (q_x,q_y,q_z,1)$ is a point in world coordinates, $$\left([P\mathbf{q}]_x/[P\mathbf{q}]_w, [P\mathbf{q}]_y/[P\mathbf{q}]_w, [P\mathbf{q}]_z/[P\mathbf{q}]_w, 1\right) = (n_x, n_y, n_z, 1)$$ is the corresponding projected point in normalized device coordinates. This mapping is not linear and cannot be inverted by taking $P^{-1}$; but notice that $$P\mathbf{q} = [P\mathbf{q}]_z \mathbf{n}.$$ and so $\mathbf{q} = (P^{-1}\mathbf{n}) / (P^{-1}\mathbf{n})_z.$

Of course, this is the clicked point's position in world coordinates, not the location of the projected point on the black plane. To find the position of the point on the plane, you could query the depth buffer and set $n_z$ to that depth; this approach is error-prone if the depth buffer lacks sufficient precision, though. Another approach is to simply shoot a ray from the eye through $\mathbf{q}$ and compute where it intersects the black plane.

Notice many possible places for subtle bugs in the above procedure: 1) trying to unproject the point from screen coordinates rather than NDC; 2) forgetting to flip the $y$-axis when converting to NDC; 3) setting $n_z$ to something other than $-1$; 4) forgetting to set $n_w$ to 1; 5) forgetting to divide the unprojected point by its new (non-1) $w$ coordinate; etc.

By the way, there are functions like gluUnproject that do most of the above for you---check your graphics library's documentation.

Related Question