[Math] Computing Ray from the Center of the Camera Passing through $X$ for given pixel $x$

computational geometrygeometryimage processinglinear algebraprojective-geometry

For perspective projection with given camera matrices and rotation and translation we can compute the 2D pixel coordinate of a 3D point.

enter image description here

using the projection matrix,

$$
P = K [R | t]
$$

where $K$ is intrinsic camera matrix, $R$ is rotation $t$ is translation. The projection is simple matrix multiplication $x = P X $.

It is also known we can add an extra row to the matrices above to get a $4 \times 4$ projection matrix.

$$
\tilde{P} = \left[\begin{array}{cc}
K & 0 \\ 0^T & 1
\end{array}\right]
\left[\begin{array}{cc}
R & t \\ 0^T & 1
\end{array}\right]
$$

I am trying to do the reverse. In Computer Vision Alg and Application book, pg. 54, Szeliski says we can invert $\tilde{P}$ to get the ray from camera center that'll pass through all possible points. I tried this approach for a camera with known calibration, for given pixel (200,200),

K = [[ 282.363047,      0.,          166.21515189],
     [   0.,          280.10715905,  108.05494375],
     [   0.,            0.,            1.        ]]
K = np.array(K)
R = np.eye(3)
t = np.array([[0],[1],[0]])
P = K.dot(np.hstack((R,t)))
P = np.vstack((P,[0,0,0,1]))
print P

x = np.array([200,200,10,1])
X = np.dot(lin.inv(P),x)
print X

to get a direction, which gives me

[-5.17826796 -3.14361632  9.          1.        ]

The camera was merely translated one unit in Y direction, but the resulting vector seems to be pointing in the wrong direction. Szeliski talks about a disparity which I am not sure how to encode in the pixel input. Any help on this approach, or alternative approaches to calculate the ray for given pixel, matrix with known orientation and $K$ would be helpful.

Best Answer

The numerical result that you got is (almost) correct, but it looks like you might not be interpreting it correctly. Szeliski glosses over this stuff rather quickly.

First, observe that the projection that you’ve constructed in your test leaves a point’s $z$-coordinate unchanged, so the inverse map must do the same. You should be getting something like $(-5.178,-4.144,10.,1.)$ for the inverse image of your test point. Looks like the translation by $1$ ended up in the wrong place in your world-to-camera matrix. Even so, the correct $y$-coordinate in the world frame is also negative, so that still needs explanation.

The image coordinate system used with these matrices is also homogeneous, so that after applying the projection the result still needs to be normalized by dividing through by the $z$-coordinate. Thus, the test point $(200,200,10,1)$ actually corresponds to the point $(20,20)$ in the image. According to the calibration matrix, the camera’s axis is at about $(166,108)$ in the image, so after translating back this test point is firmly in the third quadrant. The rest of the inverse map is scaling and a small translation in $y$, so it’s not really surprising that both the $x$ and $y$ world coordinates end up being negative.

Lastly, the inverse map gives you a point on the ray, but to get the direction of the ray you have to subtract the camera’s world coordinate position—the starting point of the ray—which gives $(-5.178,-3.144,10.)$ for the direction vector of the ray in the world frame. If you had applied the inverse map to the homogeneous direction vector $(200,200,10,0)$ instead of the point $(200,200,10,1)$ you would’ve gotten this value directly. As well, if you back-map the equivalent image point $(20,20,1,1)$ and compute the resulting direction vector (or map the vector $(20,20,1,0)$ directly), you’ll end up with a scalar multiple of the above direction vector.

You don’t need to do anything special to incorporate a mapped point’s discrepancy, which is just the reciprocal of its (signed) distance from the camera. The discrepancy shows up as the $w$-coordinate of the projected point after normalization. Using the same test point of $(200,200,10,1)$, its normalized equivalent is $(20,20,1,1/10)$, so $d=0.10$ for this point. Back-mapping this gives $(-0.508,-0.414,1.0,0.1)$ and normalizing by dividing by $w$ produces the same answer as at top.

Related Question