PostGIS – How to Unsplitting/Merging Lines with Coincident Endpoints

groupingpostgisst-intersectsst-union

I have a large dataset of lines/linestrings (ie. a road network), which is unregularly segmentised/ split and the lines needs to be merged at coincident endpoints (if only two lines are touching). I'd like to achieve this with PostGIS.

The image below shows the current network data (1), the current result if start and end nodes are extracted (2), and the desired output (3). The points are only for the purpose of visualising start and end points of lines.

enter image description here

I am aware of eg. the Unsplit Line (Data Management) function in ArcGIS Pro (see here), which can achieve the desired output. I am trying to replicate this functionality.

I am also aware of the ST_Union and ST_UnaryUnion function which can union/ unsplit two input geometries (both are aggregate function so they can be used with a GROUP BY SQL command), as well as the ST_Touches and ST_Intersectsfunction which can evaluate whether two geometries intersect in PostGIS.

How can I efficiently group those lines together which touch at their end points, using intersections of more than 2 lines as group separations?

Here are some trials:

Merge lines based on a group id (nb I don't know how this group id could be generated):

SELECT max(id) as id, ST_LineMerge(ST_Union(geom)) as geom
FROM line_data
GROUP BY id;

I would need to check if lines are touching at their start and end points, here is an example on how to get those points:

SELECT DISTINCT (ST_DumpPoints(ST_Union(ST_StartPoint(a.geom),ST_EndPoint(a.geom)))).geom
FROM line_data as a;

Here is a GeoJSON with some example data (using a UK projection):

{
"type": "FeatureCollection",
"name": "line_data",
"crs": { "type": "name", "properties": { "name": "urn:ogc:def:crs:EPSG::27700" } },
"features": [
{ "type": "Feature", "properties": { "id": 1 }, "geometry": { "type": "MultiLineString", "coordinates": [ [ [ 404105.557298680825625, 441484.677229618187994 ], [ 404091.557298680825625, 441490.677229618187994 ] ] ] } },
{ "type": "Feature", "properties": { "id": 2 }, "geometry": { "type": "MultiLineString", "coordinates": [ [ [ 404091.557298680825625, 441490.677229618187994 ], [ 404088.934298680804204, 441491.851229618187062 ] ] ] } },
{ "type": "Feature", "properties": { "id": 3 }, "geometry": { "type": "MultiLineString", "coordinates": [ [ [ 404088.934298680804204, 441491.851229618187062 ], [ 404087.530958589049987, 441492.488016092858743 ] ] ] } },
{ "type": "Feature", "properties": { "id": 4 }, "geometry": { "type": "MultiLineString", "coordinates": [ [ [ 404085.202298680844251, 441493.522229618160054 ], [ 404069.072298680839594, 441500.744896284828428 ] ] ] } },
{ "type": "Feature", "properties": { "id": 5 }, "geometry": { "type": "MultiLineString", "coordinates": [ [ [ 404092.23729868081864, 441499.302229618187994 ], [ 404091.817298680834938, 441498.327229618211277 ] ] ] } },
{ "type": "Feature", "properties": { "id": 6 }, "geometry": { "type": "MultiLineString", "coordinates": [ [ [ 404091.817298680834938, 441498.327229618211277 ], [ 404091.436298680840991, 441497.438229618186597 ] ] ] } },
{ "type": "Feature", "properties": { "id": 7 }, "geometry": { "type": "MultiLineString", "coordinates": [ [ [ 404082.918298680800945, 441488.421229618194047 ], [ 404085.202298680844251, 441493.522229618160054 ] ] ] } },
{ "type": "Feature", "properties": { "id": 8 }, "geometry": { "type": "MultiLineString", "coordinates": [ [ [ 404091.436298680840991, 441497.438229618186597 ], [ 404088.934298680804204, 441491.851229618187062 ] ] ] } },
{ "type": "Feature", "properties": { "id": 9 }, "geometry": { "type": "MultiLineString", "coordinates": [ [ [ 404079.932298680825625, 441481.647229618160054 ], [ 404081.162298680806998, 441484.392229618213605 ] ] ] } },
{ "type": "Feature", "properties": { "id": 10 }, "geometry": { "type": "MultiLineString", "coordinates": [ [ [ 404081.162298680806998, 441484.392229618213605 ], [ 404082.067298680834938, 441486.467229618167039 ] ] ] } },
{ "type": "Feature", "properties": { "id": 11 }, "geometry": { "type": "MultiLineString", "coordinates": [ [ [ 404082.067298680834938, 441486.467229618167039 ], [ 404082.918298680800945, 441488.421229618194047 ] ] ] } },
{ "type": "Feature", "properties": { "id": 12 }, "geometry": { "type": "MultiLineString", "coordinates": [ [ [ 404087.530958589049987, 441492.488016092858743 ], [ 404087.521519799542148, 441492.492299084202386 ] ] ] } },
{ "type": "Feature", "properties": { "id": 13 }, "geometry": { "type": "MultiLineString", "coordinates": [ [ [ 404087.521519799542148, 441492.492299084202386 ], [ 404086.918862026010174, 441492.765764027833939 ], [ 404086.1045236493228, 441493.116771948814858 ], [ 404085.705842245894019, 441493.295938385475893 ] ] ] } },
{ "type": "Feature", "properties": { "id": 14 }, "geometry": { "type": "MultiLineString", "coordinates": [ [ [ 404085.705842245894019, 441493.295938385475893 ], [ 404085.202298680844251, 441493.522229618160054 ] ] ] } }
]
}

The following code solves the issue only partly, but introduces additional problems.
We first union all linestrings into one big multilinestring via ST_Union, this single multilinestring is then fed into ST_LineMerge which joins lines according to:

Lines are joined at their endpoints at 2-way intersections. Lines are
not joined across intersections of 3-way or greater degree.

With the (ST_Dump()).geom we can then extract the individually merged linesstrings out from the single multilinestring.

SELECT max(a.id), (ST_Dump(ST_LineMerge(ST_Union(a.geom)))).geom as geom
FROM lines_data AS a;

However, this process splits also lines that do not share a common node, ie. are simply crossing, but with no coincidental points. This is quite problematic for the type of network. We also lose the individual line ids–though this is a minor problem.

Best Answer

The graph-theoretical answer for simplifying edge topologies and decreasing the size of a graph would be contraction - implemented as stable function as of pgRouting 3.0.

To perform a linear graph contraction on your topology, run

SELECT *
FROM   pgr_contraction(
         $$
           SELECT <edge_id>, <source>, <target>, <cost>, <reverse_cost>
           FROM   <edge_table>
         $$,
         ARRAY[2],
         directed => <true/false>
       )
;

The returned set of rows will hold information on what vertices have been removed from the primary topology, and the full topology information of the contracted edges.

SOLUTION, ADDED BY OP:

Given that the example data has been prepared as pg_routing topology (incl. source & target columns), a grouping gid and subsequent line merge can be achieved as follows:

The contracted_vertices column in the resulting pg_contraction table can subsequently be unnested and joined to the lines_data table based on matches in the sourceand target columns respectively. Using this newly named gid column as GROUP BY lines geometries can be merged via ST_Union. The GROUP BY gid, CASE WHEN gid IS NULL THEN id ELSE 0 END line groups by ´gid´ only if gid is not null, otherwise the id column is used as grouping to generate a row for each line segment that has no contracted vertex.

SELECT max(a.id) as id, ST_Union(a.geom) as geom
FROM ( 
    SELECT DISTINCT a.*, b.id as gid
    FROM lines AS a
    LEFT JOIN
    (SELECT 
        id, unnest(contracted_vertices) AS vertex
    FROM   (SELECT id, contracted_vertices 
            FROM   pgr_contraction(
                $$
                  SELECT id, source, target, ST_Length(geom) as cost, ST_Length(geom) as reverse_cost
                  FROM lines
                $$,
                ARRAY[2],
                 directed => false
            )) as contraction_table
    ) AS b ON a.source = b.vertex OR a.target = b.vertex
) as a
GROUP BY gid,
CASE WHEN gid IS NULL THEN id ELSE 0 END;