[GIS] Shortest path between cities without duplicate calculation (many-to-many)

dijkstragraphpgroutingpostgisshortest path

I have three tables ==> ways, nodes and cities (OpenStreetMap)
ways : Edges with required attributes for routing (source, target, cost, etc.)
nodes : Each and every nodes linking the edges
cities : The actual nodes I'll use to compute shortest ways

I need to have the shortest way between each adjacent cities, which means I don't want to calculate the shortest way between each and every cities (this will take too much time and needs to be usable on a large dataset).
Below is a picture to have a better understanding of the problem. Keep in mind those numbers are arbitrary and I'm looking for a method that would be working on a large set of data (1000+ cities).

Shortest way needed

What I want is to find a way to compute the shortest way between [2, 1], [2, 3] but NOT [1, 3] as it will just be the addition of the two others.

For now, I've tried pgr_dijkstra MANY-TO-MANY with pgRouting, the problem is that when I supply the same array as source and target, it takes too much time and can't even display the results because of some memory issue. [id_node from the table cities has been extracted from the source/target of the road sections]

SELECT * FROM pgr_dijkstra('SELECT id, source, target, cost, reverse_cost FROM ways',
array(SELECT id_node FROM cities), array(SELECT id_node FROM cities), TRUE);

The main goal is to build a graph with cities as nodes, and ways (weighted by the time needed) as edges.

Is it possible to realize that using pgRouting or do I need to make a python script ? And in that case, are they some resources I can use ?

Edit 1: Network simplification

Following @ThingumaBob advice, I drastically simplified the network by removing unecessary nodes. This was actually harder as I thought because of underpasses/overpasses. I first tried to treat them as well but I just end up inserting them without simplification (I could have done it with a recursive function or else, but it was too much time for a small improvement).
I guess my code will speak of itself :

CREATE TABLE network (
  id serial NOT NULL,
  source integer,
  target integer,
  cost double precision,
  geom geometry(LineString,4326),
  CONSTRAINT network_pkey PRIMARY KEY (id))
  WITH (OIDS=FALSE);
ALTER TABLE network
  OWNER TO "Me";

--Inserting reconstructed network
INSERT INTO network (
--Merging the network and returning simple geometry without bridges (ST_Crosses)
WITH nobridges AS (SELECT (ST_Dump(ST_LineMerge(ST_Union(lines.geom)))).geom
          FROM ways AS lines
          WHERE NOT EXISTS (SELECT 1 FROM ways w
              WHERE ST_Crosses(w.geom, lines.geom))),
--Retrieval of bridges
    bridges AS (SELECT DISTINCT w1.geom
          FROM ways w1, ways w2
          WHERE ST_Crosses(w1.geom, w2.geom)),
--Union of the two geometries and adding an ID
    merged AS (SELECT (ROW_NUMBER() OVER ())::int AS id, tb.geom
          FROM (SELECT nb.geom FROM nobridges nb
                UNION
                SELECT b.geom FROM bridges b) AS tb),
--Re-computation of costs for each edges
     costs AS (SELECT m.id, sum(ST_Length(w.geom::geography)/1000)/
               avg(w.kmh::double precision) AS cost, m.geom
          FROM ways w, merged m
          WHERE ST_Contains(m.geom, w.geom)
          GROUP BY m.id, m.geom),
--Creation of nodes and adding an ID
     nodes AS (SELECT (ROW_NUMBER() OVER ())::int as id, nd.geom
          FROM (SELECT DISTINCT geom
                FROM (SELECT ST_StartPoint(m.geom) as geom
                      FROM merged m
                           UNION
                           (SELECT ST_EndPoint(m.geom) as geom
                           FROM merged m)) AS node_all) AS nd)
 --Creation of the final table by assigning a source and a target to each edges, thus, having a functional network system
SELECT c.id, n1.id as source, n2.id as target, c.cost, c.geom
    FROM costs c
    LEFT JOIN nodes n1
        ON ST_Intersects(n1.geom, ST_StartPoint (c.geom))
    LEFT JOIN nodes n2
        ON ST_Intersects(n2.geom, ST_EndPoint (c.geom))
);

In the end, I found myself reconstructing the graph from scratch but I guess this was a better solution as it was quite fast. I don't if it's properly optimized but it works…

It gives me this :
enter image description here

As it doesn't fix my problem, it still reduce potential shortest way calculation. I'm still open for any suggestions on how to calculate only directly-connected cities

Best Answer

Since no one joined in, and if you don´t mind, I´ll share some thoughts without this being a satisfying answer (I share a work-related intereset in this).

I highly doubt you have much of an alternative to get a generalized network like the one you are looking for; this is due to the fact that the adjacency matrix of neighbouring cities you imply (to be able to, sort of, avoid unnecessary calculations) is based on a shortest path calculation as well, and for it to be the shortest path connecting each pair, well, by definition you will each have to traverse the whole graph to find it (or use a heuristic algorithm like A* maybe with its pros and cons).

In your example, as Evan said, there is no way to determine that an existing path {1, 2} is part of the shortest path of [1, 3] before walking all (dijkstra) or lots of (A*) other paths. Also, you shouldn't simply assume that if you reach a city with an existing path to the target of the running search, this is the shortest path and skip the search (imagine a triangle 'network': if you already have the path {B, C} and reach B in your search for [A, C], you shouldn`t skip and take {A, B, C} due to your assumption...).

With this said, I think you could at least try and speed up the 'all-to-all' search in some ways.

While the pgr_dijkstra(many-to-many) function is certainly a lot more sophisticated than what I could propose, writing your own function to do an 'all-to-all' search with pairwise one-to-one-dijkstra could give you the benefit of control:

  • For example, if you subsequentially merge all found paths to single linestrings (with accumulated cost and updated source/target of respective nodes) and recursively route on the updated network, you´ll get your generalized network in one turn without post-processing and the algorithm might get a speed boost with decreasing node/edge count.
  • (You´d have to clean your base network accordingly (i.e. get rid of dangling nodes/edges if there is no city assigned [you can use pgr_analyzeGraph to find those]) and at least keep track of visited cities to avoid reverse search.)
  • Additionally, you can try and carefully get only subsets of your data for each search, based on the bounding box of the city pair in question (fast) or by radius search (medium fast) or something similar. That could reduce the individual runtime a lot (or fetching those subsets for each iteration might defeat that purpose...not 100% sure, but I think it does help, with indexes and cleaned tables), but could also lead to unexpected errors (i.e. no path found due to cut-off network).

This should 'easily' be doable in PL/PgSQL (if you are familiar), maybe even by adjusting the pgr_dijkstra(many-to-many) function itself to your needs (although unlikely, due to external C dependencies or general inaccessibility...I haven´t checked).
You could as well switch to Python scripting with one of the many dijkstra implementations (I googled it and came up with heaps...I hope linking them here/now is not necessary), but you´d at least lose the benefits of spatial indexes (the whole control advantage is even more present then, though, since you can easily implement some break conditions in the main algorithm to skip the search whenever you want...if you really want).

As I said, I´m on it, for my own project. If I find something, I´ll update my answer. Maybe someone joins in with an ingenius (or obvious...) solution...

Related Question