Calculate angle of view based on object sizes in image

geometrytrigonometryvector-spaces

I'm trying to find a way to automatically calculate the angle of view of an image by comparing the size and position of two identical objects in the image.

For example, the two bottles below have identical dimensions in reality, but depending on the lens of the camera, the blue bottle may appear to shrink at different rates in the image the further its distance from the camera.

I'm wondering if it is possible to calculate the angle of view of the below image, given the following:

The two objects have the same dimensions in reality (both humans are equally tall and wide). We can assume that these are 180cm tall.
The real-world distance to the objects are not known.
The image dimensions are known.
Both bounding box dimensions are known.
Both bounding box positions relative to the image top-left origin are known.
Both objects are standing on flat ground (their bottom points intersect a common plane).

Question: Is it possible to calculate the angle of view of the image based entirely on how two (equally-sized) objects are represented in the image as they are moved closer or futher away from the camera?

EDIT: See image with bbox coordinates, here: https://i.sstatic.net/tMJKD.png

Best Answer

For the benefit of anyone coming across this question who is not familiar with the term angle of view, it is described here. It is a geometric angle that can be computed for a lens of some known focal length set to focus at a known distance and projecting onto a sensor of known dimensions.

For simplicity, since we're dealing with the vertical heights of objects in the frame, let's just consider the vertical angle of view.

Let's try running the problem in reverse.

Suppose we are given a camera with a known angle of view and a pair of identical objects of known dimensions which we are required to place on a flat, level surface. Suppose we are required to point the camera in a horizontal direction while doing this (as was apparently done in the street-scene photograph in the question).

To simplify the problem a little (temporarily), suppose the objects we are photographing are rectangular pieces of cardboard braced behind by some objects so that we can stand the rectangles upright on the surface, and assume we will place the rectangles so they are parallel to the image sensor. This means that the ratio of height to width of the bounding box of a rectangle in the image will be the same as the ratio of height to width of the actual rectangle.

Given the known height of a rectangle and the known vertical angle of view, you can compute the percentage of the height of the picture frame that the rectangle's image will occupy at any given distance. Conversely, you can compute the distance at which to place the rectangle so that it will occupy a given percentage of the height of the picture frame. You can achieve any percentage from nearly zero to more than $100$ by moving the rectangle away from the camera or toward it.

So if we are given two bounding box heights, we can place the first rectangle at a distance such that its bounding box's height is the first given height and place the second rectangle at a distance such that its bounding box's height is the second given height.

Now suppose we are told to put the upper left corner of the first bounding box at some particular coordinates in the frame. We can achieve the desired horizontal coordinate just by moving the rectangle left or right. Together with the height of the bounding box, the vertical coordinate of the corner determines a horizontal line on the rectangle (or in the rectangle's plane) that must be imaged exactly halfway between the top and bottom of the picture frame. We raise or lower the camera to the height of that line, and now the vertical coordinate is correct.

This does not work in practice if the surface is opaque and our chosen coordinates put the bottom of the bounding box above the center of the picture frame, because this would require us to take the photograph from below the surface. But let's assume we never require the bottom of the box to be above the center of the frame.

We can achieve any desired horizontal coordinate for the second bounding box, but the constraints of the setup determine the vertical coordinate: each bounding box must have exactly the same percentage of its height above the center of the frame, just as both rectangles must have the same percentage of their heights above a line at the height of the camera.

We therefore have five degrees of freedom in specifying how we would like the bounding boxes to appear: the height of the first box (which determines the width), the height of the second box (which determines the width), the horizontal and vertical coordinates of the first box's upper left corner (which together with the previous data determine the vertical coordinate of the second box's upper left corner) and the horizontal coordinate of the second box's upper left corner.

So let's say we set up this photograph for a particular set of dimensions and coordinates of the bounding boxes using a lens with a certain angle of view. Now we change to a lens with a different angle of view. We will have to move the rectangles forward or backward and left or right so that they again show up with the same dimensions and coordinates of bounding boxes, but we can do so.

Since two camera setups with different angles of view took identical pictures (as far as you can tell from the bounding boxes of the two known objects), there is no way you can deduce the angle of view from the sizes and positions of the bounding boxes.

If you have some way of measuring the distance between the objects in the two bounding boxes, however, you can determine the distance to the camera. Using that distance, the portion of the image frame occupied by the bounding box, and the known dimension of the object, you can estimate the angle of view. In the photograph of the two men in the question, if the paving stones are all the same size and you know what size they are, you might be able to estimate the distance reasonably well.

Things get a little more complicated when dealing with three-dimensional objects in the real world since the points that determine the width of the bounding box may be at a different distance from the camera than the points that determine the height of the bounding box. The width of the bounding box also is not necessarily determined by the points at the maximum width of the object, but rather by the points that subtend the maximum horizontal angle at the camera. Because of all these complications, the ratio of height to width might vary slightly as the object is moved closer to or further away from the camera, and if the object holds its shape rigidly enough, if you can measure the bounding box accurately enough, and if you know enough about the shape of the object you can (in principle) tell how far away the object is merely from the ratio of height to width in the picture. But I think those conditions would be hard to find in practice.

Best Answer

Related Solutions

[Math] Point Correspondence in 2D in two image

[Math] Is an object located within the field of view of a robot

Related Question