Efficient way to find all polygons of the same shape within a set, regardless of position, scale, or rotation

algorithmscombinatoricsgeometrypolygons

I've got a big set of 2D polygons described as a set of points. I would like to take this set of polygons and find any that are the same shape, regardless of rotation, translation, or scale.

Each polygon has 3 or more points, and in some cases may have hundreds or thousands of points. There are hundreds of polygons in total. The polygons are optimised to remove runs of points that would form a straight line, e.g. the sequence $[(0,0), (10,10), (20,20)]$ would not be allowed. The polygons also aren't allowed to intersect themselves.

The context for this problem is geometry instancing in 3D rendering, specifically rendering polygon regions and fills from PCB designs. Polygon regions are frequently repeated or copy-pasted around boards, so I would like to reduce the number of draw calls I need per frame by finding any polygon that I can instance using a transformation matrix.

My first idea was to normalise the scale of all polygons, then compare the line lengths and angles between line vectors. Two polygons with the same shape should be composed from the same line lengths and same angles in the same order.

I started implementing this, using the following procedure:

  1. Group all polygons by point count.
  2. Discard any groups that contain only one polygon, since they are guaranteed to be unique in shape.
  3. Center each polygon to $(0,0)$ and normalise its scale to a 1×1 bounding box.
  4. Calculate a "hash" for each polygon based on the sum of the magnitudes of all points, i.e. $\sum \sqrt{{x_i}^2 + {y_i}^2}$, which for any two polygons sharing the same shape should be equal.
  5. Further group the polygons by this hash, and discard any groups that have only one member. This acts as a fast filtering step to quickly reject any polygons which are obviously not the same.
  6. Turn each polygon's point list into a circular list of angle-distance pairs, $(\theta,d)$, where $\theta$ is the angle formed between the vectors meeting at the point, and $d$ is the length of the next clockwise vector.
  7. Within each group, compare all of the lists to each other to find matches.

This works in theory, but I ran into two issues. First, "clockwise" isn't well-defined for arbitrary polygons that may have both concave and convex regions. This can be solved by traversing the lists in both directions during the final step, which has the added bonus of finding mirror-image copies of the same polygon, which could be described as a negative scale in one or both of the axes, and which I can handle just fine in the instancing code. Second, and more importantly, I couldn't come up with an efficient way to compare the two circular lists of points without knowing the correct starting offsets.

It occurs to me that there might be some efficient mathematical way of identifying whether two circular sequences of number pairs contain the same ordered sequence of values regardless of whether the sequences are offset from each other.

For example, $[1,2,3,5,2]$ and $[3,5,2,1,2]$ should be considered equal because they are the same set of numbers in the same order, just with a different starting offset (the latter is the former shifted two places left). However, $[1,2,3,5,2]$ and $[2,5,3,2,1]$ should not be considered equal, because the ordering is incorrect despite the same numbers appearing in the set. In practice the elements in each list would actually be a pair of numbers rather than a single number, but that's an implementation detail.

Is there a clever way to solve this?

If not, is there an alternative approach to the polygon shape comparison problem?

Best Answer

Luckily I hang out with a bunch of people on Twitter and Mastodon who are way smarter than me, and they came up with some great ideas.

The core solution here is to concatenate one of the lists being compared to itself. For two equal-length sets $A$ and $B$, if $A \subset B^\frown B$ then $A$ must be the same sequence of $B$ with some rotation. This reduces the problem to a string search.

To demonstrate:

$$A = [1,2,3,5,2]$$ $$B = [3,5,2,1,2]$$ $$B' = B^\frown B = [3,5,2,\mathbf{1,2,3,5,2},1,2]$$ $$A \subset B'$$

For those not familiar with the notation (and for me when I inevitably come back to this answer in a year and I've forgotten set notation) the approach is essentially:

let a = "12352";
let b = "35212";
let b2 = b + b;
return b2.includes(a);

This can be optimised in a few ways.

First, the concatenation does not have to use $O(2n)$ memory, since the search can be performed on the original set by computing the search pointer (or array index) modulo the set length.

A KMP string search has a worst-case performance of $O(m)$ pre-processing and $O(n)$ matching, where $m$ is length of the string we're searching for and $n$ is the length of the string we're searching in. Since all polygons in each hash group need to be compared to each other, the pre-processing can be reused for repeated comparisons.

The KMP approach can then be modified a little to find a lexicographically minimal string rotation during the pre-processing step. This improves the average-case search performance a lot, because it is far more likely that $A$ and $B$ will have matching rotations. It's also a free optimisation, because it has the same $O(m)$ time complexity as the regular KMP pre-processing step.

Credit to @goopypanther for first suggesting the subset approach, and Jannis Harder (@jix) for suggesting the lexicographical rotation trick.