[Math] Calculate the viewing-angle on a square (3d-calc)

3dareavector analysisvectors

I'm in big trouble: My program (Java) successfully recognised a square drawn on a paper (by its 4 edges). Now I need to calculate, under which angle the webcam is facing this square. So I get the 4 coordinates of the shape, and I already had an idea: You could have a look on the proportions of the area of this shape to the maximum area possible by this square (when you would look direktly on it from the top). The first one is easy (Vectors, angles), but I have no clue on how to calculate the maximum area of this quadrangle, when it would be a square…. Is it possible to understand, what I mean?

Or do you have any clue on how to claculate this in an other way?

I'm thankful for every help/idea/everything

Best Answer

Really You would consider optics and the physics of the problem to best model the problem. In doing so you would consider a light source and how the object reflects the light. You can consider informally only the light that would get reflected to the lens of the camera and then the camera has a curved surface which is focused to a point by the lens. The focused light is intersected either by flat surface of film or light sensitive area and thus gives is actually a sequence of projections. First there is a projection from $\Bbb R^3\to \Bbb R^2$ however the norm on the image of this map is not the typical cartesean norm for most camera's because they have curved lens. Then this spherical surface or parabolic surface in $\Bbb R^2$ is mapped to typical Cartesian $\Bbb R^2$.

For sake of simplicity we will just consider a viewer at a point who see's the 3D object through a perspective view finder. Because of added complexity of tracing light sources, we will make the observation that linear perspective roughly allows us to consider a view with out it. Later on we will discuss what is missed by not consider light sources.

Wikipedia gives a general discription of the opposite task from what you ask$^1$. I will try and clarify it a little bit. Also I will adapt $\Bbb R^3$ linear transformations such as rotation and perspective projection to become a linear transformation of projective space by considering the original maps in $\Bbb R^{n+1}$ or $\Bbb R^4$.$^2$

PRELIMINARY

Rotation matrix for right handed axis.

$R_x(\alpha)=\begin{bmatrix} 1 & 0 & 0 & 0\\ 0 & cos(\alpha) & -sin(\alpha) & 0\\ 0& sin(\alpha) & cos(\alpha) & 0\\ 0 & 0 & 0 & 1 \end{bmatrix}$

$R_y(\beta)=\begin{bmatrix} cos(\beta) & 0 & sin(\beta) & 0\\ 0 & 1 & 0 &0\\ -sin(\beta)& 0 & cos(\beta) & 0\\ 0 & 0 & 0 & 1 \end{bmatrix}$

$R_z(\gamma)=\begin{bmatrix} cos(\gamma) & -sin(\gamma) & 0 & 0\\ sin(\gamma) & cos(\gamma) & 0 & 0\\ 0& 0 & 1 & 0\\ 0 & 0 & 0 &1 \end{bmatrix}$

The trick here is that you need to know if the Euler angles or Tait-Bryan Angles are intrinsic or extrinsic. Denote intrinsic and extrinsic as $\alpha_{int},\beta_{int},\gamma_{int}$ and $\alpha_{ext},\beta_{ext},\gamma_{ext}$ respectively. Then it can be shown that $R_{i,j,k}(\alpha_{int},\beta_{int},\gamma_{int})=R_i(\alpha_{int})R_j(\beta_{int})R_k(\gamma_{int})=R_k(\gamma_{ext})R_j(\beta_{ext})R_i(\alpha_{ext})$

and

$\alpha_{int}=\gamma_{ext}$, $\beta_{int}=\beta_{ext}$, $\gamma_{int}=\alpha_{ext}$

or more simply

$R_{i,j,k}(\alpha,\beta,\gamma)=R_{k,j,i}(\gamma,\beta,\alpha)$.

Camera orientation If our camera is a single point $C$ with coordinates in the standard basis. We now consider $C$'s orientation as Tait-Bryan Angles that are intrinsic to describe the orientation. Let the 2D viewing frame be equal to $z_c=a$ so that the plane is normal to the z-axis of $C$'s frame but offset by some distance $a$ otherwise it would be projected to a point on the screen. Also we will use $z,y',x''$ denoting first a rotation of the standard basis' $z$ by $\alpha$ followed the resulting basis $y'$by $\beta$ and finally $x''$ by $\gamma$.

For intrinsic angles the orientation is described by

$R_x(\gamma)R_y(\beta)R_z(\alpha)=\begin{bmatrix} c_{\beta}c_{\gamma}&-c_{\beta}s_{\gamma} & s_{\beta} & 0\\ s_{\alpha}s_{\beta}c_{\gamma}+c_{\alpha}s_{\gamma} & -s_{\alpha}s_{\beta}s_{\gamma}+c_{\alpha}c_{\gamma} & -s_{\alpha}c_{\beta} & 0\\ -c_{\alpha}s_{\beta}c_{\gamma}+s_{\alpha}s_{\gamma} & c_{\alpha}s_{\beta}s_{\gamma}+s_{\alpha}s_{\gamma} & c_{\alpha}c_{\beta} & 0\\ 0 & 0 & 0 & 1\end{bmatrix}$.

This means we start buy orientating the camera's axis the same as our world's axis, then rotate about the z-axis, then the new y-axis, and finally the new x-axis to get the camera in the viewing position to view the object. Then we add $C$ and we are in the position of $C$ and looking in the direction chosen. This is represented by

$T_C(R_{z,y,x}(\gamma,\beta,\alpha)[v])$ and

$T_C=I_{\Bbb R^4}+C_{\Bbb R^4}=\begin{bmatrix} 1 & 0 & 0 & C_x \\ 0 & 1 & 0 & C_y \\ 0 & 0 & 1 & C_z \\ 0 & 0 & 0 & 1 \end{bmatrix}$

or more simply as $T_C R_{xyz} v$. After doing the matrix multiplication we end up with the transformation from $C$'s coordinates to the standard basis coordinates $\varepsilon$, and $[A]^C_{\varepsilon}=[T_C][R_{xyz}]$ can be viewed as a familiar change of basis.

$\begin{bmatrix}1\\0\\0\\ 0\end{bmatrix} \to \begin{bmatrix}c_{\beta}c_{\gamma}\\ s_{\alpha}s_{\beta}c_{\gamma}+c_{\alpha}s_{\gamma}-C_y\\ -c_{\alpha}s_{\beta}c_{\gamma}+s_{\alpha}s_{\gamma}-C_z \\ 0 \end{bmatrix} $,$\begin{bmatrix}0\\1\\0\\ 0\end{bmatrix} \to \begin{bmatrix}-c_{\beta}s_{\gamma}\\ -s_{\alpha}s_{\beta}s_{\gamma}+c_{\alpha}c_{\gamma}\\ c_{\alpha}s_{\beta}s_{\gamma}+s_{\alpha}s_{\gamma} \\ 0\end{bmatrix}$,$ \begin{bmatrix}0\\0\\1\\ 0\end{bmatrix}\to \begin{bmatrix}s_{\beta}\\ -s_{\alpha}c_{\beta}\\ c_{\alpha}c_{\beta} \\ 0 \end{bmatrix}$ and $\begin{bmatrix} 0 \\ 0 \\ 0 \\ 1\end{bmatrix} \to \begin{bmatrix} C_x \\ C_y \\ C_z \\ 1\end{bmatrix}$

Now we can do a change of basis to work in $C$'s reference frame or the world frame. A vector $[v]_C$ can be represented in $\varepsilon$'s frame by change of basis $[A]_{\varepsilon}^C$ so that $[A]_{\varepsilon}^C[v]_C=[v]_{\varepsilon}$

$[A]_{\varepsilon}^C=\begin{bmatrix} c_{\beta}c_{\gamma} & -c_{\beta}s_{\gamma} & s_{\beta} & C_x\\ s_{\alpha}s_{\beta}c_{\gamma}+c_{\alpha}s_{\gamma} & -s_{\alpha}s_{\beta}s_{\gamma}+c_{\alpha}c_{\gamma} & -s_{\alpha}c_{\beta} & C_y\\ -c_{\alpha}s_{\beta}c_{\gamma}+s_{\alpha}s_{\gamma} & c_{\alpha}s_{\beta}s_{\gamma}+s_{\alpha}s_{\gamma} &c_{\alpha}c_{\beta} & C_z \\ 0 & 0 & 0 & 1 \end{bmatrix}$

let the entry of the $i$th column and $j$th row of $[A]_C$ be denoted by $A_{ij}$

$[A]_{\varepsilon}=[A]_C^{-1}=\frac{1}{det([A]_C)}\begin{bmatrix} (A_{22}A_{33}-A_{23}A_{32}) & -(A_{12}A_{33}-A_{13}A_{23}) & (A_{12}A_{23}-A_{13}A_{22})\\ -(A_{12}A_{33}-A_{23}A_{13}) & (A_{11}A_{33}-A_{13}A_{31}) & -(A_{11}A_{23}-A_{13}A_{21})\\ (A_{21}A_{32}-A_{22}A_{31}) & -(A_{11}A_{32}-A_{12}A_{31}) & (A_{11}A_{22}-A_{22}A_{21})\end{bmatrix}$

Alternatively we could determine ${[A]_C}^{-1}$ by considering the reverse rotations and translations. (this gives interesting unrelated insight into the relation ship between matrices and linear algebra with that of trigonometry and geometry)

It seems like a lot but computer programming it will help like for instance we can make variables for a float[][] whose entries are above, then we can assign the individual entries $A_{ij}$ to the terms involving sine and cosine of the respective Euler Angles. It's a bit of coding but this is powerful in that it computes the distances of a single object as viewed from 2 different locations as well as different orientations of axis. This arises all the time in physics and often makes calculations easier by considering the underlining symmetry or by considering physical interactions that occur in some world frame (perhaps rigidly attached to one of the bodies) then describes the interaction from various vantage points. Also your question will require consideration of reference frames of different orientations then that of the camera, otherwise the angle of the camera would be simply zero. It has to look at the object and so it must be aligned with it!

Here is a simple java class that implements matrices which really are just arrays.

  `public class Matrix
  {
     public int m;
     public int n;
     public double Elements[][];
     Matrix(int tempm, int tempn)
     {
       m = tempm;
       n = tempn;
       Elements = new double[m][n];
     }


     public Matrix Add(Matrix A, Matrix B)
     {
      if(A.n == B.n && A.m == B.m)
      {
        Matrix TempMatrix = TempAlgebra.new Matrix(A.n, B.m);
        for(int i = 0; i < A.m; i++)
        {
          for(int j = 0; j < B.m; j++)
          {
            TempMatrix.Elements[i][j]= A.Elements[i][j]+B.Elements[i][j];
          }
        }
       return TempMatrix;
     }
     else
     {
       Matrix NullMatrix = new Matrix(0,0);
       return NullMatrix;
     }
   }

   public Matrix MatrixProduct(Matrix A, Matrix B)
   {
     if(A.n == B.m)
     {
       Matrix TempMatrix = new Matrix(A.m, B.n);
       for(int i = 0; i < A.m; i++)
       {
         for(int j = 0; j < B.n; j++)
         {
           double TempElement = 0;
           for(int k = 0; k < B.m; k++)
           {
            TempElement = TempElement + A.Elements[i][k] * B.Elements[k][j];
           }
           TempMatrix.Elements[i][j] = TempElement;
           TempElement = 0;
         }
       }
       return TempMatrix;
     }
     else
     {
       Matrix NullMatrix = new Matrix(0, 0);
       return NullMatrix;
     }
  }`

This covers basic matrix and addition and matrix product, you should add scalar multipication as well as inverse. For inverse and for graphical purpose $\Bbb R^4$ should do but even then the inverse is very long. While you only have to program it once to have it's use, I recommend making a method that calculates the inverse by breaking it down to 2 by 2's. Then you can use it for higher dimensions as well.

THE PROJECTION

For the perspective projection, we shall define it by considering the rays from the various points of the object being viewed to the camera postion $C$, intersecting the plane $z=a_C$; all whilst with respect to a reference frame that takes $C$ at the origin and orientated so that the $2D$ plane is normal to the $z-axis$.

Vector analysis allows us to analyze this projection. First we need to consider the line joining a point $P_1$from the object being viewed to the point $C$.This line is described by the map from $\vec v_{P_1,C}(t):\Bbb R \to \Bbb R^3$.

$\vec v_{P_1,C}(t)=\langle C_x,C_y,C_z\rangle + \langle a,b,c\rangle t$

$C=\vec 0$ because it is the origin of the reference frame.

$\vec v_{p_1,C}(t)=\langle a,b,c \rangle t$ then

$P_{1_x}=at$, $P_{1_y}=bt$, and $P_{1_z}=ct$.

It follows that $\langle a,b,c \rangle =\langle P_{1_x},P_{1_y},P_{1_z}\rangle$.

$\vec v _{P_1,C}=t\langle P_{1_x}, P_{1_y}, P_{1_z}\rangle$

Now we need to consider the intersection or the set $\{X \in \Bbb R^3 :z=a_C\}\cap \{Y \in \Bbb R^3 \text{ and }t \in \Bbb R : \forall t \text{ } Y=t \langle P_{1_x},P_{1_y}, P_{1_z}\rangle\}$. After solving these equations simultaneously, the common elements are

$x = a(\frac{P_{1_x}}{P_{1_z}})$, $y= a (\frac{P_{1_y}}{P_{1_z}})$, and $z=a$.

Now we go back to linear algebra techniques in order to describe this projection as a linear transformation as described above from $\Bbb R^2 \to \Bbb R^3$ as matrix $[T_{proj}]$. We can do this by describing how $C$'s basis transforms assuming $[T_{proj}]$ is a linear map. However there is a problem, we can not consider the basis $i,j$ because the the map is singular here. We can remedy this by using our tools above and shifting the whole coordnate system so $C$ which was at the origin is now at $(0,0,-1)$. However because translation is not a linear map in $\Bbb R^3$ we will again work in $\Bbb R^4$ where it is. (we could of factored this in earlier and just defined $C$'s relation to the world reference frame differently but I'm doing it this way to keep $C$ at the origin perhaps for other reasons like if considering camera itself and lens).

$\begin{bmatrix} 1 & 0 & 0 & 0 \\ 0 & 1 & 0 & 0 \\ 0 & 0 & 1 & -1\\ 0 & 0 & 0 & 1 \end{bmatrix} $

takes $(0,0,0,1)$ the origin in $C$ to $(0,0,-1,1)$ in our new frame call it $C'$. We must change our projection map now because $C$ was not changed by this passive transformation and so is no longer at the origin. After refactoring

$\vec v_{P_1,C}=\langle P_{1_x}t,P_{1_y}t,P_{1_z}t+t-1\rangle$

after the intersection

$x = P_{1_x}(\frac{a+1}{P_{1_z}+1})$, $y = P_{1_y}(\frac{a+1}{P_{1_z}+1})$, and $z=a$.

After working out what the standard basis vectors map to in our new frame we finally can describe the transformation $[T_{proj}]$ as a matrix (for inter-operability I will consider it as a projective or affine transformation in $\Bbb R^4$ again.)

$[T_{proj}]=\begin{bmatrix} a+1 & 0 & 0 &0\\ 0 & a+1 & 0 & 0 \\ a & a & a & 0 \\ 0 & 0 & 0 & 1\end{bmatrix}$

Now is this a linear transformation? As it turns out this matrix while it is a linear transformation, does not reflect the projection we were trying to achieve. If we consider a vector $\lambda P_{1}$ were $\lambda$ is a scalar. It is an elementary property of linear transformations that $\lambda f(\vec v)=f(\lambda \vec v)$. This is not so with this projection.

PROJECTIVE SPACE/GEOMETRY

If we consider the relation $\sim _h$ such that $\vec x \sim _h \vec y$ then $\vec x = \lambda \vec y$, that is the relation defined on all vectors or coordinates that are a scalar multiple of each other. It turns out that this relation is an equivalence relation and like wise they form equivalence class which partition euclidean space $\Bbb R^n$. We also can see that we can find a point $(x,y,z)\sim_h (\frac{a_Cx}{z},\frac{a_Cy}{z},a_C) \text{ } \forall \vec x \in \Bbb R^3$. Indeed a more clever may have realized this from the very begining, and mitigated this entire analysis. But we do come out with the advantage that we can do nearly everything with linear algebra. If $[T_{rot}]^{\varepsilon}_C$ are $4 \times 4$ rotation matrix, $[A_T]$ the translation matrix then we can describe an object in any position and orientation in our world frame and then change it to $C$'s frame with the transformation $T_{rot}(A_C()) = [A_T][T_{rot}]^{\varepsilon}_C$. Then with the map

$\begin{bmatrix} 1 & 0 & 0 & 0\\ 0 & 1 & 0 & 0 \\ 0 & 0 & 1 & 0 \end{bmatrix}$

This allows use to get rid of the 4 coordinates, it is necessary because when factoring a constant multiple out of the results. Then finally we have

$(\begin{bmatrix}0\\0\\1\end{bmatrix}[I_{3 \times 4}][A_{tran}][T_{rot}]^{\varepsilon}_C)^{-1}[I_{3 \times 4}][A_{tran}][T_{rot}]^{\varepsilon}_C\vec v_{\varepsilon}$

which can be reduced using matrix multiplication to simple $[M_{2d proj}]^{\varepsilon}_{C} \vec v_{\varepsilon}.$

The result will be the coordinates in the form $(\frac{ax}{z},\frac{ay}{z},a)$. because the z coordinate is kind of redundant you further reduce it to a 2D coordinate as we did in reducing the 4D to 3D.

ANSWER to you're question is NO, this transformation is not invertible. This agrees with intuition as an object's 2D points loose there depth and their size is lost. Using projective geometry you could say a 2D coordinate corresponds to an entire class of 3D objects. If you have point $(x,y)$ which corresponds to the point $(x,y,z)t$ (or simply a 3D point in projective space). That is the line from the origin intersecting the 3D point.

What you may have been looking for If you knew the z coordinate then you could work out more. Or if you knew the distance between two points in 3D of a rigid body, using that or any other method to get the 3D coordinates in the camera's reference frame. Finally you need to attach some reference frame to the object in a precise way and it is then that it may be possible to work out the camera orientation. You would consider the origin to determine the translation $4 \times 4$. For the rotation, you can consider rotations as I described above, the product of 3 rotations as $3 \times 3$ with it's elements various products of sine and cosine of the respective Euler Angles. You then have 3 basis vectors or representing 3 equations in 3 unknowns each (really 9 equations with 3 unknowns), but this only makes sense after you have fully described how you are describing the orientation ie. Euler Angles/Tait-Bryan Angles, intrinsic or extrinsic, and the order of the rotations about the axis such as $z-y'-x''$ which was the Tait-Bryan intrinsic convention I adopted earlier.

In the end this answer is long but the question requires you adopt strict notation which you have not. You haven't even described how you should like to describe the camera's orientation? I chose to try and represent every thing as linear transformations when possible. The reason for this is when trying to implement this in java. If you do it this way, you only need to write a Linear Algebra class which can represent up to 4D matrices, handle matrix multiplication, and perhaps compute the inverse. After this is done its very short to implement this sort of thing and easy. Also it is modular you could use the same class to work out physics or something, making this more worth your time then some maybe easier but ad-hoc approaches.100x100x100 pixel square Another angle is easily displayed by changing Euler angles

As you can see it may not be immediately obvious how accurate these projections are. It appears there is some distortion towards the edge of the images. If the camera lens were accurately modeled and linear perspective studied$^1$, knowing various dimensions of the cube, would permit the projection to be reversed, the rest was all linear maps and have inverse. Given the rotational symmetry there still may be some issues, but under a controlled environment it is possible to determine the 3D object's dimensions completely. As far as java this would require edge detection as well which is a whole other subject in itself.

Related Question