[Math] Relationship of camera matrices and real world units

I have problem to understand the relationship between a camera matrix and real world coordinates.

Let's say I have a camera, with the following (calibration) parameters:

focal length in px = 336
principal point x = 640.5
principal point y = 400.5
pixel size = 4.2 µm
width = 1280
height = 800

That leads to the following intrinsic matrix:

\begin{bmatrix}
336 & 0 & 640.5\\
0 & 336 & 400.5\\
0&0&1
\end{bmatrix}

Now I want to project a point $$p = (1,1,2)$$ (in meter) into two different cameras. Camera 1 has $$ P = K\cdot[I|0]$$ and camera 2 has $$P=K\cdot[I|t] \quad with \quad t = [-0.2;0;0;1]$$ (0.2 meter translated in x-direction with respect to camera 1).

My problem is I dont't know in what units I have to express $p$ and $t$. Is it meter or maybe millimeter? What is the relationship of a $world->image$ projection when it comes to real units?

Maybe someone can help me here?

Best Answer

It turns out that it doesn’t matter what units you use for the extrinsic matrix as long as they’re consistent with the units that you use for world coordinates.

For simplicity, let’s assume that there’s no rotation, as is the case for your matrices. That part of the extrinsic matrix isn’t affected by the units of measurement, anyway. We have some intrinsic matrix $K$ and $$E=\pmatrix{1&0&0&t_x\\0&1&0&t_y\\0&0&1&t_z}$$ for the extrinsic. If we apply these to the world point $(x,y,z)$ we get $$K\cdot E\cdot\pmatrix{x\\y\\z\\1}=K\cdot\pmatrix{x+t_x\\y+t_y\\z+t_z}.\tag{1}$$ Changing the units of measurement amounts to multiplying the last column of $E$ by some constant scale factor $s$ and multiplying world coordinates by the same factor. With these new units, we have $$K\cdot\pmatrix{1&0&0&st_x\\0&1&0&st_y\\0&0&1&st_z}\cdot\pmatrix{sx\\sy\\sz\\1}=K\cdot\pmatrix{s(x+t_x)\\s(y+t_y)\\s(z+t_z)}.$$ Since $K$ is linear, that factor of $s$ remains after multiplication by $K$ and will cancel out when you divide through by the last coordinate to get the 2-D coordinates of the point in the image plane, giving the same result as $(1)$. (Remember that in homogeneous coordinates, $(x,y,1)$ and $(kx,ky,k)$ with $k\ne0$ represent the same point.)

Best Answer

Related Solutions

Related Question