[Math] What would be the most efficient way to detect all closed paths in a collection of segments and connectors

algorithmsgraph theoryplanar-graphs

We have a data set which is comprised of Connectors and Segments. Each segment has exactly two connectors, but each connector can belong to zero or more segments (i.e. connector 'A' in the left image below has no segments, while connector 'M' has three, M-R, M-L and M-N.)

It is understood that wherever any lines meet or intersect, there will be a connector so we don't have to worry about even/odd rules, overlapping or partially-enclosed polygons, etc. as they don't apply.

In short, we're trying to identify all of the created polygons (the colored shapes in the right image.) I believe this can be completed in two steps.

Polygons

Part 1: Removing superfluous items

Stand-alone connectors (connector 'A' here) can simply be removed since they can't be part of a shape's outline.

Floating end-points referencing a single segment (connectors 'B' and 'E') can also be removed as they too can't be part of a shape's outline. This will also remove their referenced segments (B-C and E-D).

Performing the above recursively will next identify 'C' as an endpoint (since 'B' and B-C were already removed) so it and it's remaining segment C-D can also be removed. On the next recursive pass, connector 'D' and segment D-F will also be removed, etc.

However, I haven't found a good way to identify segment G-H. That said, I think that can be achieved during polygon detection since such segments would only be the result of compound paths and would be traced in both directions during one shape's detection. (More on that below.)

Step 2: Polygon Detection

Each segment can be traced in two directions. For instance, the segment connecting 'O' and 'P' can be either O-P or P-O. Picking a trace-direction of clockwise, O-P would belong to the polygon O-P-Q-N whereas P-O would belong to the polygon P-O-I.

The following logic assumes a trace-direction of clockwise.

Starting from any segment, when tracing around, if you get back to your starting point, you have identified a potential polygon. By keeping a running delta of your heading's angle as you trace around (this is how much your heading turns and is not to be confused with simply adding the angles between segments), when done, if that angle is positive, you've detected a valid polygon. If it's negative, you've detected a 'containing' polygon, meaning one that contains one or more 'valid' polygons. The outer perimeter of the entire shape (or shapes) are all containing polygons.

Consider the case of a square, diagonally divided into two triangles. Tracing each segment twice–once in each direction–you will end up with three potentially-valid polygons: a square and two triangles. The triangles will have a positive angle delta telling you they're valid, but the square's angle delta will be negative telling you that's the containing polygon.

Note: A containing polygon can be equal to a valid polygon too. It will just be 'wound' in the opposite direction.

Consider a simple triangle. The clockwise trace will yield the valid polygon. The second attempt to trace clockwise will actually yield a counter-clockwise trace which will give you a negative angle delta, telling you that's actually the outline of the shape.

Note: You also have to test for other polygons encountered along the way by also testing each point for any previously-encountered point during that shape detection. If you find you've revisited the same point, save off the polygon created since the first encounter of that point, check it's angle. If it's positive, it's a valid polygon (and you're actually currently tracing a containing polygon.) If it's negative, you've detected a containing polygon (in which case you're currently tracing a valid polygon.) Finally, remove all segments on your accumulation stack back to the first instance that point was last encountered and continue on with your detection.

For instance, if you started at 'J' and traced around counter-clockwise, you would go through 'I', 'H', then 'G', then 'F' then you'd be back at 'H'. You just found a polygon H-G-F which has a negative angle so you know it's a containing polygon. Remove those three segments from your stack and continue on. Now you'll again hit 'I'. In this case, you already visited that same segment during this pass, but in the other direction, so simply remove that segment completely from your stack and continue on, next to 'O' then 'N', etc. You'll eventually be back at 'J'.

When a segment has been traced in both directions, it can be considered 'used' and no further processing of that segment is needed. Continue processing all non-used segments. Once all segments have been traced in both directions, you can be sure all polygons–valid and containing–have been found.

Finally, check each containing polygon to see if it falls within any valid polygon. If so, exclude it from that valid polygon creating a compound path. In the example here, containing polygon H-G-F is contained by the valid cyan polygon so it should be excluded. Note there is also a valid H-F-G polygon which is marked in red here.

Anyway, that's what I've come up with, but I'm wondering if there's a better/simpler way. Thoughts?

Best Answer

First, note that the common mathematical terms for the objects you are dealing with are: (plane, undirected) graph for your data set, vertex for a connector, and edge for a segment. Also, if a vertex $w$ is joined to a vertex $v$ by an edge, we say that $v$ is a neighbour of $w$.

Furthermore, for our purposes a path is a sequence $(v_1, \dotsc, v_n)$ of vertices such that $v_i, v_{i+1}$ are connected by an edge for every $1 \leq i < n$, and a cycle is a path where $v_1 = v_n$ and no other vertex appears twice.

Now, from what you say you aren't assuming that your graph is connected, i.e. that every two vertices can be the endpoints of a path. Thus it isn't clear to me how you are detecting when a polygon is contained in another one to which it isn't connected.


Anyway, I'm going to propose a simpler recursive algorithm to detect every cycle in your graph. Those are precisely the boundaries of all the polygons contained in the graph. Then you can determine the "valid" polygons in various ways. For example, a somewhat inefficient way would be:

  • For every polygon A:
    • For every polygon B:
      1. If $A \subset B$: continue
      2. Subtract $A \cap B$ from $A$ and add back $\partial B \cap A$

Now, to detect the cycles you can do as follows:

  1. Set every vertex as not visited
  2. Create an empty path $P$
  3. Pick a random not visited vertex $w$
  4. Set $w$ as visited and append it to $P$
  5. For every neighbour $v$ of $w$:
    • Create a copy $P'$ of $P$
    • Set $v$ as visited and append it to $P'$
    • If $v$ was already in $P'$, return the sub-path that starts and ends with $v$
    • Repeat from 5 with $w := v$ and $P := P'$
  6. If there are some not visited vertices left repeat from 2, otherwise we're done
Related Question