Reverse perspective matrix to find 2D coordinate with known height.

linear algebramatrices

I am using a camera to track a robot. The equation below, from OpenCV, gives an equation for finding pixel coordinates from 3D coordinates. I want to do the reverse. I know that usually this would be impossible as any pixel represents a line of infinite 3D points. However, I know that the robot will always have a constant height (z = 350 mm always). Therefore, I believe it is possible to find the x and y coordinates from pixel coordinates.

From observation it appears that even using the equation as intended map 3D points to pixel coordinates is impossible, as performing the matrix multiplication would yield a 4×1 column vector, so how could those values be mapped to the pixel coordinates, a 3×1 column vector?

The equation provided is:

$$
\left(\begin{matrix}
u& \\
v&\\
1&
\end{matrix}\right)
=
\left(\begin{matrix}
f_x& 0& c_x& \\
0& f_y & cy& \\
0& 0& 1&
\end{matrix}\right)
\left(\begin{matrix}
r1& r2& r3& tx& \\
r4& r5& r6& ty& \\
r7& r8& r9& tz&
\end{matrix}\right)
\left(\begin{matrix}
x&\\
y&\\
350&\\
1&
\end{matrix}\right)
$$

Where u and v are known pixel coordinates, and fx, fy, cx, cy; all rotation (r(1…9); and translation t(x,y,z) values are known.

However, because the rotation translation matrix is not square, I cannot find the inverse to solve the simultaneous equation. I have seen that I can add a row of zeros, ending with a 1 to it. And add a row and column and zeros to the 3×3 matrix. Is this allowed?

Such that the equation becomes:

$$
\left(\begin{matrix}
u& \\
v&\\
1&
\end{matrix}\right)
=
\left(\begin{matrix}
f_x& 0& c_x& 0& \\
0& f_y & cy& 0&\\
0& 0& 1& 0& \\
0& 0& 0& 1&
\end{matrix}\right)
\left(\begin{matrix}
r1& r2& r3& tx& \\
r4& r5& r6& ty& \\
r7& r8& r9& tz& \\
0& 0& 0& 1&
\end{matrix}\right)
\left(\begin{matrix}
x&\\
y&\\
350&\\
1&
\end{matrix}\right)
$$

However, now if I were to perform the matrix multiplication, or find the inverse matrices and rearrange the to find x and y, it appears that the system is over-defined.

My question: Is adding the rows and columns viable, and what would I have to add to the 3×1 pixel coordinates column vector for the matrix multiplication to be valid? Is what I am attempting even possible?

Diagram of camera placement

Thank you very much for your help.

Best Answer

Perspective projection isn’t injective, even if you treat it as mapping onto some plane in $\mathbb R^3$ instead of the plane $\mathbb R^2$: it maps an entire line onto a single point. Any matrix that represents this map is going to be singular, so you’re not going to get what you need by trying to invert the matrix.

Padding the matrices as you’ve done isn’t going to work, either, because the matrix product that you’ve got represents an affine transformation of space—note that the last row of the product of the two $4\times4$ matrices is $(0,0,0,1)$—but the projection that you’re working with isn’t an affine transformation.† You also have to be careful with using strict equality to compare homogeneous coordinates of points, which might be how you ended up with an overconstrained or inconsistent system of equations. Moreover, the equation that you’ve written down is nonsensical since the left hand side is a $3\times1$ vector while the product on the right-hand side produces a $4\times1$ vector.

Following the discussion in section 6.2.2 of Harley and Zisserman’s Multiple View Geometry In Computer Vision, an image point back-projects to a ray in the scene. You know two points on this ray: the camera center $\mathbf C$, which can be recovered from the camera matrix $\mathtt P$, and the point at infinity $\mathtt P^+\mathbf x$, where $\mathbf x$ is the homogeneous coordinate vector of the image point and $\mathtt P^+ = \mathtt P^T(\mathtt P\mathtt P^T)^{-1}$ is the pseudo-inverse of $\mathtt P$. You can then find the point that you’re looking for by computing the point on the ray $\mathbf X(\lambda) = \mathtt P^+\mathbf x + \lambda\mathbf C$ that has the required (inhomogeneous) $z$-coordinate.

You’re probably working with a finite camera, in which case you have a more convenient parameterization of the ray available. Writing $\mathtt P = [\mathtt M\mid\mathbf p_4]$, the inhomogeneous coordinates of the camera center are $\tilde{\mathbf C} = -\mathtt M^{-1}\mathbf p_4$ and the back-projected ray intersects the plane at infinity at the point $\mathbf D = \left((\mathtt M^{-1}\mathbf x)^T,0\right)^T$, i.e., the direction vector of the ray is $\mathtt M^{-1}\mathbf x$, so in inhomogeneous coordinates the back-projected ray is $$\tilde{\mathbf X}(\mu) = \mathtt M^{-1}(\mu\mathbf x-\mathbf p_4).$$ Set the third coordinate of this equal to your target height and solve for $\mu$. Since you already have $\mathtt M$ decomposed into the product of an upper-triangular matrix and a rotation, its inverse is particularly easy to compute. I’ll leave working out that detail to you.

You can also avoid introducing the parameter $\mu$ and compute the desired point directly using the ray’s Plücker matrix. The point you’re looking for is the intersection of the back-projected ray with the plane $z=h$, where $h$ is whatever your target height is. Let $\mathbf\pi = (0,0,1,-h)^T$. Then the intersection of the ray and $\mathbf\pi$ is given by the expression $$(\mathbf C\mathbf D^T-\mathbf D\mathbf C^T)\mathbf\pi = (\mathbf\pi^T\mathbf D)\mathbf C-(\mathbf\pi^T\mathbf C)\mathbf D.$$ The zeros in $\mathbf\pi$ should allow some useful simplifications of this expression for the purposes of optimizing the computation.


† Although, if the object is far from the camera, an affine approximation can be good enough and is easier to work with in many ways.

Related Question