Is there a closed form solution for the camera position seeing points

computer visionprojectionrotations

I have a simple pinhole camera with 3×3 intrinsics matrix K, external orientation 3×3 matrix R and 3×1 position p.

K = [
  [f, 0, cx],
  [0, f, cy],
  [0, 0, 1 ],
]

p is a function of real value t such that:

p(t) = a + t*b, with a and b 3x1 vector 

So the camera's orientation is fixed by its position is a function of t that slides is along a vector.
I would like to find the minimum value of t such that a collection of 3D point is visible.

Said another way I'm trying to move the camera (in a limited way) such that a scene is entirely visible.
Here is an illustration of a bunch of points * and two potential camera positions p1 and p2 where p1 sees all the points and p2 does not see two of them:

enter image description here

I can project a 3D point into the camera using:

[x, y, 1] = K * R.T * point + (-R.T) * t

And determine if its visible if 0 < x < cx and 0 < y < cy

I would like to find the camera position that can see all the points. One solution is to iterate through increasing values of t until all points are visible. But is there a closed form solution for t?

Best Answer

starfield zoom

Yes. If your motion involves zooming along the camera axis, you can compute exactly where the camera should go, by iterating over each of the points and computing five crossing times (one for each edge of the film, plus the camera plane):

$$t_\ell = \frac{f_x(X^\prime_i +a_x)}{b(x_\ell - x_0)}-\frac{1}{b}(Z^\prime_i + a_z)$$ $$t_r = \frac{f_x(X^\prime_i +a_x)}{b(x_r - x_0)}-\frac{1}{b}(Z^\prime_i + a_z)$$ $$t_u = \frac{f_y(Y^\prime_i +a_y)}{b(y_u - y_0)}-\frac{1}{b}(Z^\prime_i + a_z)$$ $$t_d = \frac{f_y(Y^\prime_i +a_y)}{b(y_d - y_0)}-\frac{1}{b}(Z^\prime_i + a_z)$$ $$t_z = -\frac{(Z^\prime_i+a_z) }{b}$$

Assuming forward motion, the smallest number (out of all five numbers and all objects) is the final moment when all objects are in view. At that time and every earlier time, the objects are all in view.


  • As seen in the picture above, the key idea is that as you track the camera along the path $p(t)$, each of your points will trace out a straight line path as seen on the 2D camera film. The orientation and speed of a point's trajectory depends on where that point is in space. In particular, points move faster the closer they get to the camera.

  • For each point, you can figure out when that linear path will go out of bounds, for each of the four bounding regions (too far left, up, down, right to be visible). The formula to compute it is just the intersection of the 2D line with one of the edges of the film. You solve for the 2D point of intersection, then back-solve for the time $t$ that your point will actually reach it.

  • There's a fifth crossing time that matters, which is the time the object passes through the plane of the camera. We calculate that crossing time for each object as well.

  • If the camera is moving forward, then all objects are initially in view (when the camera is "very far away") but eventually get closer and closer to the camera until they cross one of the edges of the film. Therefore, the best viewing time is $T_{min}$, the earliest time any object crosses any edge of the film. At that time (and at any earlier time), all of the objects will be precisely in view. Find it by computing the minimum among all the crossing times you computed— among all objects and all five crossing zones.

Quick recipe

  1. Compute $[0,0,b] = \mathbf{R}\mathbf{b}$. Because the camera motion is along the camera axis, the camera velocity is solely in the camera's $z$ direction. Its magnitude is $b$.

  2. For each object at $[X, Y, Z, 1]$, compute the position of the object in camera coordinates at $t=0$. In other words, compute: $$\begin{bmatrix}X^\prime + a_x\\Y^\prime + a_y \\ Z^\prime + a_z\end{bmatrix} \equiv \begin{bmatrix}\mathbf{R} &|-\mathbf{R}\mathbf{a}\end{bmatrix}\begin{bmatrix}X\\Y\\Z\\1\end{bmatrix}$$

  3. There are four edges of the film. We must compute the time where the image of the object crosses over each edge.

    For example, consider the left edge of the film; call its position $x_\ell$ (in image coordinates) and let $x_0$ be the image center (what you call $c_x$ in the question). Then the object crosses over the left edge at time: $$t = \frac{f_x(X^\prime +a_x)}{b(x_\ell - x_0)}-\frac{1}{b}(Z^\prime + a_z)$$

    Repeat this process for all four edges, comparing $x$ coordinates for the left and right edges, and $y$ coordinates for the top and bottom edges.

  4. There is a "fifth" edge, which is when the object passes through the camera plane. This occurs when $t= -\frac{(Z^\prime+a_z) }{b}$.

  5. When $b$ is positive (the usual case), the camera is moving forward, so objects that cross an edge are moving out of film.

    To compute a time when all of the objects are on film, take the minimum of all the crossing times for all the objects $T_{min}$. Any time before $T_{min}$, all of the objects are in view. The camera gets closer and closer until, at $T_{min}$, one of the objects passes out of view. Hence the instant $T_{min}$ (or slightly before) is an ideal time to view all the objects. When $b$ is negative, the camera is moving backward and objects are springing into view, so take the maximum of all the crossing times to find the time the final object springs into view. At $T_{max}$ and every time after, all of the objects are in view.

Mathematical details

  1. Your camera transformation matrix is $$\begin{bmatrix}x\\y\\1\end{bmatrix} = \mathbf{K} \cdot \mathbf{R} \cdot \begin{bmatrix}\mathbf{I}& |-\mathbf{p}\end{bmatrix} \cdot \begin{bmatrix}X\\Y\\Z \\ 1\end{bmatrix}$$ From right to left, the matrix translates and rotates the position of an object from world coordinates into camera coordinates; then $\mathbf{K}$ transforms camera coordinates into 2D image coordinates on film.

  2. If the path of the camera is specified in world coordinates, then the translation vector is $\mathbf{p}(t) \equiv \mathbf{a}+\mathbf{b}t$. $$\begin{bmatrix}x\\y\\1\end{bmatrix} = \mathbf{K} \cdot \begin{bmatrix}\mathbf{R}& |-\mathbf{R}(\mathbf{a}+\mathbf{b}t)\end{bmatrix} \cdot \begin{bmatrix}X\\Y\\Z \\ 1\end{bmatrix}$$

  3. Now we introduce the simplifying assumption from the diagram: the camera motion is parallel to the camera axis so we're panning the camera forwards/backwards. In this case, the camera velocity vector $\mathbf{b}$ is purely in the camera's z-direction. $\mathbf{R}\mathbf{b} = [0, 0, b]$. Let's write

    $$-\mathbf{R}(\mathbf{a}+\mathbf{b}t)\equiv \begin{bmatrix}a_x \\ a_y \\ a_z + bt\end{bmatrix}$$ and $$\mathbf{R}\begin{bmatrix}X\\Y\\Z\\1\end{bmatrix} = \begin{bmatrix}X^\prime \\ Y^\prime \\ Z^\prime\end{bmatrix}$$

    These are the positions of the camera $\mathbf{p}$ and the object $[X,Y,Z,1]$ as expressed in camera-coordinates.

  4. Applying the internal matrix $\mathbf{K}=\begin{bmatrix}f_x & 0 & x_0\\ 0 & f_y & y_0 \\ 0 & 0 & 1 \end{bmatrix}$, we get, finally, $$\begin{bmatrix}x\\y\\1\end{bmatrix} = \begin{bmatrix}\frac{f_x(X^\prime +a_x)}{(Z^\prime + a_z) + bt}\\\frac{f_y(Y^\prime +a_y)}{(Z^\prime + a_z) + bt} \\ 1\end{bmatrix}$$ or

    $$x(t) = \frac{f_x(X^\prime +a_x)}{(Z^\prime + a_z) + bt} + x_0$$ $$y(t) = \frac{f_y(Y^\prime +a_y)}{(Z^\prime + a_z) + bt} + y_0$$

    Behind the notational mess, these parametric equations are pretty simple: they describe motion in a straight line. The speed of the motion changes over time, but its direction is always the same (as you can prove by taking derivatives):

    $$\begin{bmatrix}f_x(X^\prime+a_x), \;f_y(Y^\prime+a_y) \end{bmatrix}$$ Intuitively, this is what the zooming-starfield image was showing us: the motion is along straight lines pointing toward the vanishing point.

  5. You can now use these parametric equations to find when the projection of the object enters the film. For example, suppose the left edge of the film is $x_\ell$ (in image coordinates). Find the time $t$ which solves:

$$\begin{align*} x(t) & = x_\ell \\ \frac{f_x(X^\prime +a_x)}{(Z^\prime + a_z) + bt} + x_0 & = x_\ell \\ f_x(X^\prime +a_x) & = (x_\ell - x_0)[(Z^\prime + a_z) + bt] \\ \frac{f_x(X^\prime +a_x)}{(x_\ell - x_0)}-(Z^\prime + a_z) & = bt \\ \frac{f_x(X^\prime +a_x)}{b(x_\ell - x_0)}-\frac{1}{b}(Z^\prime + a_z) & = t \\ \end{align*}$$

This is the time when the image of the object crosses over the left edge of the film. Note that we have to check the direction of motion to see whether it's coming into view or passing out of view.

  1. There are four edges to the film to find crossing times for. All of these have the general form we found for the left side edge. Check when $x(t)$ is equal to the left or right edge of the film, and check when $y(t)$ is equal to the top or bottom edge of the film. Note that because the object is fixed in place and the camera never turns back, an object that crosses in/out of view stays in/out of view for all time afterward. Hence we have found infinite intervals of time in which the object is within view of each of the edges of the film, e.g. $(t_\ell, \infty)$.

There is actually a "fifth edge" to check, which is when the object passes through the camera plane. This occurs when $$(Z^\prime + a_z) + bt = 0$$

It is important to know when the object is in front of the camera, because our crossing-time formulas will produce spurious answers whenever the object is behind the camera.

  1. For all objects $[X_i,Y_i,Z_i,1]$, compute all five timing intervals $[t_{i,1}, t_{i,2}]$. Remember each of these intervals is infinite in one direction.

Compute the intersection of all of the intervals. Remember that the intersection of two intervals $[a_1, b_1]$ and $[a_2, b_2]$ is $[ \max(a_1,a_2), \min(b_1,b_2)]$.

  1. Consider the resulting intersection $[a,b]$. This is the time interval in which all of the objects are in view. If $a>b$, there is no such time. Otherwise, $a$ is the earliest time that all objects are in view, and $b$ is the latest time.