I can answer this:
What is the affine transformation converting world coordinates to camera coordinates? (camera world coordinates: $c=(c_x,c_y,c_z)^\top$, visual center world coordinates, $v=(v_x,v_y,v_z)^\top$)
I'm assuming the traditional camera image coordinates (before projection) having $z$ drilling "into" the image, $x$ pointing from left to right, and $y$ pointing downward.
Now let's track how the axes must be rotated without translation:
1. the new $z$ axis ($z'$) will point along $v-c$.
1. the new $x$ axis ($x'$) is perpendicular to $z$ and $z'$
1. the new $y$ axis ($y'$) is perpendicular to $x'$ and $z'$.
You can find three vectors that point along the new axes in world coordinates, normalize them, then put them in the rows of a $3\times 3$ matrix $R$: this converts world coordinates to rotated camera orientation.
Finally, if you know the translation $t$ in world coordinates (it would be $(-10,-10,-10)^\top$ to translate to the camera's position in world coordinates) then the translation in camera coordinates is $t'=Rt$
Let's actually carry this out for your example. Let's work on a triad of orthogonal vectors:
$z'=(-1,-1,-1)$, pointing in the direction the camera must face.
$x'=z'\times z=(-1,1,0)^\top$
$y'=z'\times x'=(1,1,-2)^\top$
Normalizing these and using them as the rows of a matrix you get:
$$
R=\frac{1}{\sqrt{6}}\begin{bmatrix}
-\sqrt{3}&\sqrt{3}&0\\
1&1&-2\\
-\sqrt{2}&-\sqrt{2}&-\sqrt{2}
\end{bmatrix}
$$
Then $t'=Rt=(0,0,10\sqrt{3})$.
Notice that the angle of declination is an odd angle near $35^\circ$ rather than exactly $45^\circ$. (I had a hard time seeing this at first, but if you draw a cube and check the angle between $(1,1,0)$ and $(1,1,1)$ you'll see what I mean.)
Now you've converted world coordinates to rotated frame that is aligned with your camera's frame, but differs by a translation.
This gives you the resulting affine transformation $\begin{bmatrix}R&t'\\0_{1\times 3}&1\end{bmatrix}$
which carries world coordinates to camera coordinates.
As a sanity check, you can confirm that the world's origin maps to camera $(0,0,10\sqrt{3})^\top$ and that the world camera location $(10,10,10)$ now maps to the camera's origin. A third check of your choice should be sufficient to convince you this is the right $R$ and $t'$.
One caveat: I'm not 100% sure the step with $z\times z'$ is always in this order. I picked it this way on this occasion because it gave the right orientation for $x'$ and $y'$ in the end. Hopefully that is all consistent, but maybe there is some sign ambiguity after all.
The second question is how to construct the "UP" vector.
I don't understand what you are asking. If you mean the camera coordinates for the direction of the world $z$-axis, then that would just be $R(0,0,1)^\top +t'$.
Finally, I will have to rotate camera as well from "landscape" to "portrait" orientation .
I'm interpreting this to mean that you'd want to rotate the image plane so that the $y$-axis is horizontal, which could be done with a $\pi/4$ rotation in either way around the camera $z$-axis.
This transformation should be entirely obvious:
$$U=
\begin{bmatrix}
0&-1&0\\
1&0&0\\
0&0&1\end{bmatrix}
$$
$U$ gives the rotation in the clockwise direction around the $z$ axis (which would look to be counterclockwise if you are looking up the $z$ axis into the picture) and $U^\top$ would give the rotation in the other direction.
Instead of trying to debug your code and verify all of those back-mappings, I’m going to describe a way for you to check your own results objectively. If you don’t have a good idea of what the results should be, then I don’t really see how you can tell whether or not they’re “reasonable.”
Assuming that there’s no skew in the camera, the matrix $K$ has the form $$K=\begin{bmatrix}s_x&0&c_x\\0&s_y&c_y\\0&0&1\end{bmatrix}.$$ The values along the diagonal are $x$- and $y$- scale factors, and $(c_x,c_y)$ are the image coordinates of the camera’s axis, which is assumed to be normal to the image plane ($z=1$ by convention). So, in this coordinate system, the direction vector for a point $(x,y)$ in the image is $(x-c_x,y-c_y,1)$ and to get the corresponding direction vector in the (external) camera coordinate system, divide by the respective scale factors: $((x-c_x)/s_x,(y-c_y)/s_y,1)$. This is exactly what you get by applying $K^{-1}$, which is easily found to be $$K^{-1}=\begin{bmatrix}1/s_x&0&-c_x/s_x\\0&1/s_y&-c_y/s_y\\0&0&1\end{bmatrix}$$ using your favorite method. Finally, to transform this vector into world coordinates, apply $R^{-1}$, which is just $R$’s transpose since it’s a rotation. The resulting ray, of course, originates from the camera’s position in world coordinates. It should be a simple matter to code up this cascade explicitly, after which you can compare it to the results that you get by any other method that you’re experimenting with.
In this specific case, $R$ is just the identity matrix, so there’s nothing else to do once you’ve got the direction vector in camera coordinates. We have $$s_x=282.363047 \\ s_y=280.10715905 \\ c_x=166.21515189 \\ c_y=108.05494375$$ so the internal-to-external transformation is approximately $$\begin{align}x&\to x/282.363-0.589 \\ y&\to y/280.107-0.386.\end{align}$$ Applying this to the point $(20,20)$ from your previous question gives $(-0.518,-0.314,1)$, which agrees with the direction vector computed there. Taking $(10,10)$ instead results in $(-0.553,-0.350,1)$, which you can then check against whatever your code produced, and so on.
All that aside, there’s a gotcha when using the pseudoinverse method described by Zisserman. He gives the following equation for the back-mapped ray: $$\mathbf X(\lambda)=P^+\mathbf x+\lambda\mathbf C.$$ Note that the parameter is a coefficient of $\mathbf C$, the camera’s position in world coordinates, not of the result of back-mapping the image point $\mathbf x$. Converted into Cartesian coordinates, there’s a factor of $\lambda+k$ (for some constant $k$) in the denominator, so this isn’t a simple linear parameterization. To extract a direction vector from this, you’ll need to convert $P^+\mathbf x$ into Cartesian coordinates and then subtract $\mathbf C$.
To illustrate, applying $P^+$ to $(10,10,1)$ produces $(-0.553,-0.175,1.0,-0.175)$, so the ray is $(-0.553,-t-0.175,1.0,t-0.175)$. In Cartesian coordinates, the back-mapped point is $(3.161,1.0,-5.713)$ and subtracting the camera’s position gives $(3.161,2.0,-5.713)$. To compare this to the known result above, divide by the third coordinate: $(-0.553,-0.350,1.0)$, which agrees.
Update 2018.07.31: For finite cameras, which is what you’re dealing with, Zisserman suggests a more convenient back-projection in the very next paragraph in equation (6.14). The underlying idea is that you decompose the camera matrix as $P = \left[M\mid\mathbf p_4\right]$ so that the back-projection of an image point $\mathbf x$ intersects the plane at infinity at $\mathbf D = ((M^{-1}\mathbf x)^T,0)^T$. This gives you the direction vector of the back-projected ray in world coordinates, and, of course, the camera center is at $\tilde{\mathbf C}=-M^{-1}\mathbf p_4$, i.e., the back-projected ray is $$\tilde{\mathbf X}(\mu) = -M^{-1}\mathbf p_4+\mu M^{-1}\mathbf x = M^{-1}(\mu\mathbf x-\mathbf p_4).$$ This parameterization of the ray doesn’t suffer from the non-linearity mentioned above.
Best Answer
With cameras $C_1$ and $C_2$ with respective camera matrices $P_1^{\{W\}} = \begin{bmatrix} R_1 & t_1 \\ \mathbf{0} & 1 \end{bmatrix}$ and $P_2^{\{W\}} = \begin{bmatrix} R_2 & t_2 \\ \mathbf{0} & 1 \end{bmatrix}$, where $W$ denotes the world frame, we want to find the transformation matrix $P_1^{\{2\}}$ that is the transformation from $C_1$ to $C_2$. You can just use $P_1^{\{W\}}$ and $P_2^{\{W\}}$ to find this, since you know they are both given in the same frame. The basic process is to transform from $C_1$ to $W$ to $C_2$.
Step 1:
Given a point $q^{\{1\}}$ in $C_1$, the the world coordinate is given by $q^{\{W\}} = t_1 + R_1 q^{\{1\}}$
Step 2:
Given a point $q^{\{W\}}$ in $W$, the $C_2$ coordinate is given by $q^{\{2\}} = R_2^{-1} (q^{\{W\}} - t_2)$
Step 3:
Combine steps 1 and 2. You have $$ q^{\{2\}} = R_2^{-1} (q^{\{W\}} - t_2) $$ $$ q^{\{2\}} = R_2^{-1} ((t_1 + R_1 q^{\{1\}}) - t_2) $$ $$ q^{\{2\}} = R_2^{-1} (R_1 q^{\{1\}} + t_1 - t_2) $$ $$ q^{\{2\}} = R_2^{-1} R_1 q^{\{1\}} + R_2^{-1} (t1-t2) $$ which you can write as $$ q^{\{2\}} = P_1^{\{2\}} q^{\{1\}} $$ where $$ P_1^{\{2\}} = \begin{bmatrix} R_2^{-1} R_1 & R_2^{-1} (t_1 - t_2) \\ \mathbf{0} & 1\end{bmatrix} $$ If you'd like to simplify with notation a bit, and knowing that since $R_2$ is orthonormal that $R_2^{-1} = R_2^T$, you can write $$ P_1^{\{2\}} = \begin{bmatrix} R_2^{T} R_1 & t_{12} \\ \mathbf{0} & 1\end{bmatrix} $$ where $t_{12} = t_1^{\{2\}} - t_2^{\{2\}}$.