The details will vary according to what you mean exactly by yaw, pitch, roll. But here is a general way of transforming the axes. It may not be the best theoretical way to do it, but it works fine computationally.
Let rotations about the $X$-axis by an angle $\theta$ be denoted by $R_{X,\theta}$, and similarly for the other axes. A frame is then obtained by $$T_{\theta,\phi,\psi}:=R_{X,\phi}R_{Y,\theta}R_{Z,\psi}$$
If I understand your question correctly, you want the three angles that would give $T_1T_2^{-1}$ given two frames $T_1$, $T_2$.
- Calculate the matrix $S:=T_1T_2^{-1}$
- Calculate $\phi=\arctan_2(S_{3,3},-S_{2,3})$, $\theta=\arctan_2(\sqrt{S_{1,1}^2+S_{1,2}^2},S_{1,3})$, $\psi=\arctan_2(S_{1,1},-S_{1,2})$.
Then $\theta$, $\phi$, and $\psi$ are the required angles. Here $\arctan_2(x,y)$ is the modified arctan function that gives the angle in the correct quadrant.
Note that there may be different values of $\phi$, $\theta$, $\psi$ that give the same transformation $S$.
So, we're going to describe camera direction needed to direct the camera positioned at $(0,0,0)$ at an $(x,y,z)$ point using the angles from spherical coordinates. The two angles are $\theta$ and $\phi$.
$\theta$, the azimuthal angle, normally taken from $0˚$ to $360˚$, is the angle made in the $xy$ plane between the $x$-axis and the line connecting $(0,0,0)$ to $(x,y,0)$. Simply put, this gives us our "yaw".
$\phi$, the polar angle, normally taken from $0˚$ to $180˚$, is the angle made between the $z$-axis and the line connecting $(0,0,0)$ to $(x,y,z)$. This gives us something like the pitch. That is, $\phi=90˚$ means that you're looking horizontally, whereas $\phi=0˚$ means that you're looking vertically upward.
Now, for a given point $(x,y,z)$, the calculations are as follows:
$$
\theta = \arctan\left(\frac yx\right)\\
\phi = \arctan\left(\frac {\sqrt{x^2+y^2}}z\right)
$$
So for example: to point your camera at the point $(1,1,1)$, you would need the angles
$$
\theta = \arctan\left(\frac 11\right)=45˚\\
\phi = \arctan\left(\frac {\sqrt{2}}1\right) \approx 55˚
$$
Best Answer
From looking at your diagram, it appears to me that the transformation takes the unit vector $(i_x,i_y,i_z)$ and rotates it to point to $(f_x,f_y,f_z)$:
$$\begin{pmatrix}c_\Psi&s_\Psi&0\\-s_\Psi&c_\Psi&0\\0&0&1\end{pmatrix} \begin{pmatrix}1&0&0\\0&c_\theta&-s_\theta\\0&s_\theta&c_\theta\end{pmatrix} \begin{pmatrix}c_\phi&-s_\phi&0\\s_\phi&c_\phi&0\\0&0&1\end{pmatrix} \begin{pmatrix}i_x\\i_y\\i_z\end{pmatrix} = \begin{pmatrix}f_x\\f_y\\f_z\end{pmatrix} $$
where $c_\phi = \cos(\phi), s_\phi = \sin(\phi)$, etc.
It isn't hard to solve for $\phi,\theta$ and $\Psi$ if you assume that your camera begins by pointing in a convenient direction such as $(i_x,i_y,i_z) = (0,0,1)$.