[Math] Finding the area of intersection and area of union between two sets of overlapping rectangles

calculusgeometrylinear algebraoptimizationrectangles

Suppose I have $N_r$ number of red rectangle boxes and $N_b$ rectangle boxes in an image of some arbitrary size. Each box is parameterized as $(x_{\min}, y_{\min}, x_{\max}, y_{\max})$. I would like to find the area of intersection $A_I$ and the union $A_U$ between two sets of overlapping rectangles. This is because I would like to calculate the intersection over union between the two sets. That is, I would like to find the pink area in the two examples shown in the image below:

enter image description here

I came up with a solution to find the intersection between two rectangles:

$$ A_I = [\min(x_{\max,r},x_{\max,b}) – \max(x_{\min,r},x_{\min,b}) + 1] \cdot [\min(y_{\max,r},y_{\max,b}) – \max(y_{\min,r},y_{\min,b}) + 1] $$

$$ A_U = (x_{\max,r}-x_{\min,r})\cdot(y_{\max,r}-y_{\min,r}) + (x_{\max,b}-x_{\min,b})\cdot(y_{\max,b}-y_{\min,b}) – A_I$$

However, I am having trouble coming up with a solution for a problem with multiple boxes in each of the red and blue rectangle sets. It should be noted that the solution needs to be differentiable. This is because I am using it to calculate the error between the red and blue rectangles and trying to optimize the parameters in a neural network that would produce a set of the rectangles. So you can think of the red rectangles as the prediction and the blue as the ground truth.

Best Answer

One possible approach is to decompose each set into a non-overlapping set of axis-aligned rectangles (henceforth boxes), where each edge is shared by at most two rectangles.

Let's say we have a set of two partially overlapping boxes. You split them into non-overlapping sub-boxes (cells), using a grid where each cell is rectangular, but each column of cells has their own width, and each row of cells has their own height: Decomposed overlapping boxes Here, $$\begin{array}{lll} L_1 = x_{min,a} & C_1 = x_{min,b} & R_1 = x_{max,a} \\ L_2 = x_{min,a} & C_2 = x_{min,b} & R_2 = x_{max,b} \\ L_3 = x_{min,b} & C_3 = x_{max,a} & R_3 = x_{max,b} \\ \; & C_4 = x_{max_a} & \; \end{array}$$ $$\begin{array}{lll} T_1 = y_{min,a} & M_1 = y_{min,b} & B_1 = y_{max,a} \\ T_2 = y_{min,a} & M_2 = y_{min,b} & B_2 = y_{max,b} \\ T_3 = y_{min,b} & M_3 = y_{max,a} & B_3 = y_{max,b} \\ \; & M_4 = y_{max,a} & \; \end{array}$$

Each cell in the grid has width, height, and a color: unused, set 1, set 2, or intersection.

Each vertical cell edge has three pieces of information: its $x$ coordinate, its $height$, and the name of the variable its $x$ coordinate depends on (so that edges defined by a specific variable can be found). Similarly, each horizontal cell edge has an $y$ coordinate, $width$, and the name of the variable its $y$ coordinate depends on.

With the above information, it is trivial to compute the areas. Implement a procedure that sums (width×height) of each cell based on color. Then, the union is the sum of set 1, set 2, and intersect areas; and of course the area of the intersection is just the intersect sum.

More importantly, you can just as easily compute the gradient (the partial derivatives of the area function with respect to each original variable), split by color pairs:

For each horizontal edge of width $w$, examine the color of the cell above, and the color of the cell below. This edge affects the derivative of union with respect to the variable $z$ related to the edge by $dU/dz$, and/or of intersection by $dO/dz$:

  • unused, set1: $dU/dz = -w$
  • unused, set2: $dU/dz = -w$
  • unused, intersection: $dU/dz = -w, \; dO/dz = -w$
  • set1, intersection: $dO/dz = -w$
  • set2, intersection: $dO/dz = -w$
  • intersection, set1: $dO/dz = +w$
  • intersection, set2: $dO/dz = +w$
  • intersection, unused: $dU/dz = +w, \; dO/dz = +w$
  • set1, unused: $dU/dz = +w$
  • set2, unused: $dU/dz = +w$

Similarly for the vertical edges.

Some background math to explain this might be in order.

The area of the union $U(x_1, y_1, ..., x_N, y_N)$ is piecewise linear with respect to each coordinate. That is, when one single coordinate (edge of an axis-aligned rectangle) changes by a differential amount $dx_i$, the area of the union changes by amount $(h_{R,i} - h_{L,i})dx_i$, where $h_{R,i}$ is the total length of outside edges defined by this variable on the right side, and $h_{L,i}$ the total length of outside edges defined by this variable on the left side. (This is because increment of the variable on the right edge increments the area, but on the left edge decrements the area.)

This is easiest to understand by looking at a single axis-aligned rectangle, defined by $x_L \le x_R$ and $y_B \le y_T$, where $(x_L , y_B) - (x_R , y_T)$ and $(x_L , y_T) - (x_R , y_B)$ are the two diagonals of the rectangle. Then, $$A(x_L , y_B , x_R , y_T ) = ( x_R - x_L ) ( y_T - y_B )$$ and the partial derivatives (defining $\nabla A$) are $$\frac{d\,A}{d\,x_L} = - ( y_T - y_B )$$ $$\frac{d\,A}{d\,y_B} = - ( x_R - x_L )$$ $$\frac{d\,A}{d\,x_R} = ( y_T - y_B )$$ $$\frac{d\,A}{d\,y_T} = ( x_R - x_L )$$ When we decompose the set of overlapping boxes to non-overlapping boxes in a grid, the above applies to each box, if and only if the other box sharing the same edge is of a different type: if it were of the same type, moving the single edge between the two boxes by an infinitesimal amount would not have any effect on their total area at all.

This is not a particularly hard programming problem, because there are many different ways to implement the data structures needed to solve it. (Although it does mean that finding a near-optimal solution is hard, simply because there are so many different ways of implementing this, and only actual practical testing would show which ones are most effective.)

I would suggest implementing it and testing it separately first, perhaps having the test program output an SVG image of the resulting decomposed non-overlapping set, with outside horizontal edge widths and vertical edge heights and the variable names their coordinates depend on written on the outside of the image, for area and gradient verification by hard.

If this approach feels intractable, it should be noted that for $N$ rectangles ($4N$ independent variables), you could simply calculate the derivatives using $$\frac{d\,A(c_1, c_2, ..., c_{4N})}{d\,c_i} = \frac{A(..., c_i + \Delta_i, ...) - A(..., c_i - \Delta_i, ...)}{\Delta_i}$$ (involving $8N$ area calculations, so linear complexity with respect to $N$), where $\Delta_i$ is a small perturbation in the coordinate $c_i$. In a very real sense, if $\Delta_i$ approximates the typical change in one iteration step (if done along variable $c_i$), this should give more realistic feedback! You see, since the area functions are piecewise linear, and there are up to $2N-1$ pieces along each axis, the derivative, or infinitesimal or instantaneous change along that axis, may not be truly informative!

For example, consider a case where the two sets have a long thin hole somewhere. Because the hole has a long edge, the derivative with respect to the variables defining the long edges of the holes are large, and that makes those variables "more important" when looking at the derivatives only. In reality, the area of the hole may be minuscule compared to the totality, making those variables not at all important in reality. If $\Delta_i$ for the variables is large enough to span the hole, the estimate obtained with it better reflect the actual change in area if the hole were covered.