[Math] Get the camera transformation matrix (Camera pose, not view matrix)

3dlinear algebralinear-transformationsmatricesrotations

Let's say that I have an object and a camera (its representation) in a 3D world coordinate system. I have the camera pose to see the object (rotation matrix and translation (eye position)). If I apply a transform matrix to the object (a certain scaling/translation/rotation), what do I need to find the correct transform matrix that'll move my camera pose in order that what the camera sees remain unchanged (like it was before the transform was applied on the object)? I can't just apply the same transform as the object (it'll work for a translation, but rotation will cause to lose its target…). Also, a restriction that I have is that I can't apply transforms directly on the world, only on the object itself and on the camera itself (rotation/scaling/translation about the world coordinate system center).

Thanks to anyone that will takes time to help me!
(Sorry for grammar errors, I speak French)

Marc-Antoine

Best Answer

I'm on the right track now, I did as I've suggest earlier :

Should I apply the object's transform to the target point and to a given vector between the camera eye point and target point, then I'll get my new target and my new camera eye point (which is the end of my new vector)? Will that be enough information to setup the camera pose properly?

but I've also applied the transform to the up-vector of the camera to get it right. Result: for any kind of translation/rotation applied to the object the camera move properly (Yeah!). The only problem that remains is when I scale the object. If I shrink it, it gets smaller in the camera view after the transform is applied (and vice-versa).

Related Solutions

[Math] How to find extrinsic camera matrix

I can answer this:

What is the affine transformation converting world coordinates to camera coordinates? (camera world coordinates: $c=(c_x,c_y,c_z)^\top$, visual center world coordinates, $v=(v_x,v_y,v_z)^\top$)

I'm assuming the traditional camera image coordinates (before projection) having $z$ drilling "into" the image, $x$ pointing from left to right, and $y$ pointing downward.

Now let's track how the axes must be rotated without translation: 1. the new $z$ axis ($z'$) will point along $v-c$. 1. the new $x$ axis ($x'$) is perpendicular to $z$ and $z'$ 1. the new $y$ axis ($y'$) is perpendicular to $x'$ and $z'$.

You can find three vectors that point along the new axes in world coordinates, normalize them, then put them in the rows of a $3\times 3$ matrix $R$: this converts world coordinates to rotated camera orientation.

Finally, if you know the translation $t$ in world coordinates (it would be $(-10,-10,-10)^\top$ to translate to the camera's position in world coordinates) then the translation in camera coordinates is $t'=Rt$

Let's actually carry this out for your example. Let's work on a triad of orthogonal vectors:

$z'=(-1,-1,-1)$, pointing in the direction the camera must face.

$x'=z'\times z=(-1,1,0)^\top$

$y'=z'\times x'=(1,1,-2)^\top$

Normalizing these and using them as the rows of a matrix you get:

$$ R=\frac{1}{\sqrt{6}}\begin{bmatrix} -\sqrt{3}&\sqrt{3}&0\\ 1&1&-2\\ -\sqrt{2}&-\sqrt{2}&-\sqrt{2} \end{bmatrix} $$

Then $t'=Rt=(0,0,10\sqrt{3})$.

Notice that the angle of declination is an odd angle near $35^\circ$ rather than exactly $45^\circ$. (I had a hard time seeing this at first, but if you draw a cube and check the angle between $(1,1,0)$ and $(1,1,1)$ you'll see what I mean.)

Now you've converted world coordinates to rotated frame that is aligned with your camera's frame, but differs by a translation. This gives you the resulting affine transformation $\begin{bmatrix}R&t'\\0_{1\times 3}&1\end{bmatrix}$ which carries world coordinates to camera coordinates.

As a sanity check, you can confirm that the world's origin maps to camera $(0,0,10\sqrt{3})^\top$ and that the world camera location $(10,10,10)$ now maps to the camera's origin. A third check of your choice should be sufficient to convince you this is the right $R$ and $t'$.

One caveat: I'm not 100% sure the step with $z\times z'$ is always in this order. I picked it this way on this occasion because it gave the right orientation for $x'$ and $y'$ in the end. Hopefully that is all consistent, but maybe there is some sign ambiguity after all.

The second question is how to construct the "UP" vector.

I don't understand what you are asking. If you mean the camera coordinates for the direction of the world $z$-axis, then that would just be $R(0,0,1)^\top +t'$.

Finally, I will have to rotate camera as well from "landscape" to "portrait" orientation .

I'm interpreting this to mean that you'd want to rotate the image plane so that the $y$-axis is horizontal, which could be done with a $\pi/4$ rotation in either way around the camera $z$-axis.

This transformation should be entirely obvious:

$$U= \begin{bmatrix} 0&-1&0\\ 1&0&0\\ 0&0&1\end{bmatrix} $$

$U$ gives the rotation in the clockwise direction around the $z$ axis (which would look to be counterclockwise if you are looking up the $z$ axis into the picture) and $U^\top$ would give the rotation in the other direction.

[Math] Transform plane to another coordinate system

Working in homogeneous coordinates, the Cartesian equation $ax+by+cz+d=0$ can be expressed as $\mathbf\pi^T\mathbf X=0$, where the homogeneous vector $\mathbf\pi=[a:b:c:d]$. If $\mathtt M$ is a nonsingular transformation matrix, then $$\mathbf\pi^T\mathbf X=\mathbf\pi^T\mathtt M^{-1}\mathtt M\mathbf X=(\mathtt M^{-T}\mathbf\pi)^T(\mathtt M\mathbf X) = 0,$$ which shows that the vectors that represent planes are covariant: if points transform as $\mathbf X'=\mathtt M\mathbf X$, then planes transform as $\mathbf\pi'=\mathtt M^{-T}\mathbf\pi$.

In your case, the equation of the plane in camera coordinates is given by the point-normal form $\mathbf N\cdot(\mathbf X-\mathbf P)=0$, so we have $\mathbf\pi_C=[\mathbf N^T;-\mathbf N^T\mathbf P]^T$. We have for the world-to-camera mapping the matrix $\mathtt M = \left[\begin{array}{c|c}\mathtt R & \mathbf T\end{array}\right]$ and so camera-coordinate planes are transformed into world coordinates by $(\mathtt M^{-1})^{-T} = \mathtt M^T$, i.e., $$\mathbf\pi_W = \mathtt M^T\mathbf\pi_C = \left[\begin{array}{c|c} \mathtt R^T & \mathbf 0 \\ \hline \mathbf T^T & 1\end{array}\right]\begin{bmatrix} \mathbf N \\ -\mathbf N^T \mathbf P\end{bmatrix} = \begin{bmatrix} \mathtt R^T \mathbf N \\ \mathbf N^T\mathbf T-\mathbf N^T\mathbf P \end{bmatrix}.$$

For your example, $\mathbf\pi_C = [1,2,1,-9]^T$ and $$\mathbf\pi_W = \left[\begin{array}{r}0&0&1&0\\-1&0&0&0\\0&-1&0&0\\3&3&9&1\end{array}\right]\left[\begin{array}{r}1\\2\\1\\-9\end{array}\right] = \left[\begin{array}{r}1\\-1\\-2\\9\end{array}\right]$$ and so the equation of the plane in world coordinates is $x-y-2z+9=0$.

Using your approach, transform $\mathbf P_C$ to world coordinates: $$\mathbf P_W = \left[\begin{array}{c|c} \mathtt R^T & -\mathtt R^T\mathbf T \end{array}\right] \begin{bmatrix}\mathbf P_C\\1\end{bmatrix} = \left[\begin{array}{r}0&0&1&-9\\-1&0&0&3\\0&-1&0&3\end{array}\right]\begin{bmatrix}1\\4\\0\\1\end{bmatrix} = \left[\begin{array}{r}-9\\2\\-1\end{array}\right].$$ Compared to what you described in your question, it looks like you only translated $\mathbf P_C$, but to convert to world coordinates it must be both translated and rotated. Normal vectors are covariant, so $$\mathbf N_W = (\mathtt R^{-1})^{-T}\mathbf N_C = \mathtt R^T\mathbf N_C = \left[\begin{array}{r}0&0&1\\-1&0&0\\0&-1&0\end{array}\right]\begin{bmatrix}1\\2\\1\end{bmatrix} = \left[\begin{array}{r}1\\-1\\-2\end{array}\right],$$ giving for the world-coordinate equation of the plane $$(1,-1,2)\cdot(x+9,y-2,z+1)=x-y-2z+9=0$$ as above. Comparing this to your calculation, you transformed the normal vector incorrectly as well.

We can check distances, as you suggest: The distance of the plane from the camera (camera-coordinate origin) is $${[1,2,1]\cdot[1,4,0]\over\|[1,2,1]\|} = {9\over\sqrt6}.$$ The world coordinates of the camera are the last column of the camera-to-world matrix, and the world-coordinate distance of this point from the plane is $${|[1,-1,-2]\cdot[-9,3,3]+9|\over\|[1,-1,-2]\|} = {9\over\sqrt6}.$$ You can also check that this is indeed the correct plane by transforming a few points on it to world coordinates and then plugging those coordinates into its world-coordinate equation.