[Math] How to find extrinsic camera matrix

computer visioncoordinate systemsprojective-geometryrotations

I need to construct camera extrinsic parameters matrix in a form like $ C =\begin{bmatrix} r_{11} & r_{12} & r_{13} & t_{1} \\
r_{21} & r_{22} & r_{23} & t_{2} \\
r_{31} & r_{32} & r_{33} & t_{3} \end{bmatrix}$ (where $r$ is rotation matrix and $t$ is a translation vector), so that $C$ could be used to project 3D point $X=\begin{bmatrix} x \\ y \\ z \\1 \end{bmatrix}$ to camera image plane (like $\begin{bmatrix}x_{px}\\y_{px} \\1 \end{bmatrix}= K * dist(normalize(CX))$, with $K$ as camera intrinsics, $dist$ as distortion function).

I would like to do it in a form like OpenGL allows (https://www.opengl.org/sdk/docs/man2/xhtml/gluLookAt.xml), so that I should be able to specify camera center, the point that camera should be "looking at" (this point will be in the center of the image). Unfortunately, the algorithm at https://www.opengl.org/sdk/docs/man2/xhtml/gluLookAt.xml misses formatting, so I'd like a clarification on how to do this, especially in relation to my coordinate system:

enter image description here

The second question is how to construct the "UP" vector. First, I need it to be pointing "up" (the Y axis of the image plane lies along Z world axis). Should this vector always be $\begin{bmatrix}0\\0\\1 \end{bmatrix}$ or it has to be calculated relatively to camera rotation? (suppose the camera is at (10,10,10) and it "looks" at (0,0,0), so it's "optical axis" is under 45 degrees with Z axis – what the vector UP will be in this case?)

Finally, I will have to rotate camera as well from "landscape" to "portrait" orientation (not exactly by 90 degrees, but by some degree around 89-91 deg, to simulate human's inaccuracy), so the Z axis in the world will lie along with X axis of the camera image plane. How this rotation can be achieved?

(I'm sorry for my English as it is not my first language)

Best Answer

I can answer this:

What is the affine transformation converting world coordinates to camera coordinates? (camera world coordinates: $c=(c_x,c_y,c_z)^\top$, visual center world coordinates, $v=(v_x,v_y,v_z)^\top$)

I'm assuming the traditional camera image coordinates (before projection) having $z$ drilling "into" the image, $x$ pointing from left to right, and $y$ pointing downward.

Now let's track how the axes must be rotated without translation: 1. the new $z$ axis ($z'$) will point along $v-c$. 1. the new $x$ axis ($x'$) is perpendicular to $z$ and $z'$ 1. the new $y$ axis ($y'$) is perpendicular to $x'$ and $z'$.

You can find three vectors that point along the new axes in world coordinates, normalize them, then put them in the rows of a $3\times 3$ matrix $R$: this converts world coordinates to rotated camera orientation.

Finally, if you know the translation $t$ in world coordinates (it would be $(-10,-10,-10)^\top$ to translate to the camera's position in world coordinates) then the translation in camera coordinates is $t'=Rt$

Let's actually carry this out for your example. Let's work on a triad of orthogonal vectors:

$z'=(-1,-1,-1)$, pointing in the direction the camera must face.

$x'=z'\times z=(-1,1,0)^\top$

$y'=z'\times x'=(1,1,-2)^\top$

Normalizing these and using them as the rows of a matrix you get:

$$ R=\frac{1}{\sqrt{6}}\begin{bmatrix} -\sqrt{3}&\sqrt{3}&0\\ 1&1&-2\\ -\sqrt{2}&-\sqrt{2}&-\sqrt{2} \end{bmatrix} $$

Then $t'=Rt=(0,0,10\sqrt{3})$.

Notice that the angle of declination is an odd angle near $35^\circ$ rather than exactly $45^\circ$. (I had a hard time seeing this at first, but if you draw a cube and check the angle between $(1,1,0)$ and $(1,1,1)$ you'll see what I mean.)

Now you've converted world coordinates to rotated frame that is aligned with your camera's frame, but differs by a translation. This gives you the resulting affine transformation $\begin{bmatrix}R&t'\\0_{1\times 3}&1\end{bmatrix}$ which carries world coordinates to camera coordinates.

As a sanity check, you can confirm that the world's origin maps to camera $(0,0,10\sqrt{3})^\top$ and that the world camera location $(10,10,10)$ now maps to the camera's origin. A third check of your choice should be sufficient to convince you this is the right $R$ and $t'$.

One caveat: I'm not 100% sure the step with $z\times z'$ is always in this order. I picked it this way on this occasion because it gave the right orientation for $x'$ and $y'$ in the end. Hopefully that is all consistent, but maybe there is some sign ambiguity after all.

The second question is how to construct the "UP" vector.

I don't understand what you are asking. If you mean the camera coordinates for the direction of the world $z$-axis, then that would just be $R(0,0,1)^\top +t'$.

Finally, I will have to rotate camera as well from "landscape" to "portrait" orientation .

I'm interpreting this to mean that you'd want to rotate the image plane so that the $y$-axis is horizontal, which could be done with a $\pi/4$ rotation in either way around the camera $z$-axis.

This transformation should be entirely obvious:

$$U= \begin{bmatrix} 0&-1&0\\ 1&0&0\\ 0&0&1\end{bmatrix} $$

$U$ gives the rotation in the clockwise direction around the $z$ axis (which would look to be counterclockwise if you are looking up the $z$ axis into the picture) and $U^\top$ would give the rotation in the other direction.

Related Question