Image Processing – How to Convert Image Plane Coordinates to World Coordinates?

computer visionimage processingsystems of equations

Let's say we have an RGBD image and we are interested in a polygon region of this image. The region of interest is defined by 4 points in the image plane.

The depth information within this region is noisy. To denoise the depth information, we have projected the the points from the image plane into the world frame and fit a plane by way of SVD.

We want to convert the 4 image plane points, which define our polygon, to world coordinates such that the $z$ values lie in the plane.

To do this, we denote each image plane point as $(y_p, x_p)$. Given the camera intrinsics $(f_x, f_y, p_x, p_y)$ and the equation of a plane in world coordinates, we can convert world coordinates to image plane coordinates via the following:
$$y_p = y\frac{f_y}{z} + p_y$$
$$x_p = x\frac{f_x}{z} + p_x$$

Which can be rewritten as:
$$ -f_yy + (y_p – p_y)z = 0 $$
$$ -f_xx + (x_p – p_x)z = 0 $$

Given that the equation of a plane is $ax + by + cz = -d$, we can treat this as a system of linear equations $AX = B$ where:
$$
A = \begin{bmatrix}
a & b & c \\
-f_x & 0 & (x_p – p_x) \\
0 & -f_y & (y_p – p_y)
\end{bmatrix}\quad
X = \begin{bmatrix}
x \\
y \\
z
\end{bmatrix}\quad
B = \begin{bmatrix}
-d \\
0 \\
0
\end{bmatrix}\quad
$$

However, solving $X$ for each of the 4 points and visualizing the results, along with the plane, yields an incorrect result:

solve vis

The light blue region of the plane corresponds to our polygon region, from the image plane. We expect the blue points, i.e. our polygon corners, to lie in the corners of this region. While they do lie on the plane, they are offset from the blue region. If we sanity check the visualization by plotting the points used to fit the plane, instead of the corner points, we see that they align nicely:

polygon vis

What is causing the offset? Is there an error in the system of equations defined above?

Best Answer

Turns out the math is sound. The problem was due to image dimension conventions across python packages i.e. I had one library which mapped (y,x) to (width, height) and another which mapped (x,y) to (width, height). Correcting for this removed the offset.