I have three tables ==> ways, nodes and cities (OpenStreetMap)
ways : Edges with required attributes for routing (source, target, cost, etc.)
nodes : Each and every nodes linking the edges
cities : The actual nodes I'll use to compute shortest ways
I need to have the shortest way between each adjacent cities, which means I don't want to calculate the shortest way between each and every cities (this will take too much time and needs to be usable on a large dataset).
Below is a picture to have a better understanding of the problem. Keep in mind those numbers are arbitrary and I'm looking for a method that would be working on a large set of data (1000+ cities).
What I want is to find a way to compute the shortest way between [2, 1], [2, 3] but NOT [1, 3] as it will just be the addition of the two others.
For now, I've tried pgr_dijkstra MANY-TO-MANY with pgRouting, the problem is that when I supply the same array as source and target, it takes too much time and can't even display the results because of some memory issue. [id_node from the table cities has been extracted from the source/target of the road sections]
SELECT * FROM pgr_dijkstra('SELECT id, source, target, cost, reverse_cost FROM ways',
array(SELECT id_node FROM cities), array(SELECT id_node FROM cities), TRUE);
The main goal is to build a graph with cities as nodes, and ways (weighted by the time needed) as edges.
Is it possible to realize that using pgRouting or do I need to make a python script ? And in that case, are they some resources I can use ?
Edit 1: Network simplification
Following @ThingumaBob advice, I drastically simplified the network by removing unecessary nodes. This was actually harder as I thought because of underpasses/overpasses. I first tried to treat them as well but I just end up inserting them without simplification (I could have done it with a recursive function or else, but it was too much time for a small improvement).
I guess my code will speak of itself :
CREATE TABLE network (
id serial NOT NULL,
source integer,
target integer,
cost double precision,
geom geometry(LineString,4326),
CONSTRAINT network_pkey PRIMARY KEY (id))
WITH (OIDS=FALSE);
ALTER TABLE network
OWNER TO "Me";
--Inserting reconstructed network
INSERT INTO network (
--Merging the network and returning simple geometry without bridges (ST_Crosses)
WITH nobridges AS (SELECT (ST_Dump(ST_LineMerge(ST_Union(lines.geom)))).geom
FROM ways AS lines
WHERE NOT EXISTS (SELECT 1 FROM ways w
WHERE ST_Crosses(w.geom, lines.geom))),
--Retrieval of bridges
bridges AS (SELECT DISTINCT w1.geom
FROM ways w1, ways w2
WHERE ST_Crosses(w1.geom, w2.geom)),
--Union of the two geometries and adding an ID
merged AS (SELECT (ROW_NUMBER() OVER ())::int AS id, tb.geom
FROM (SELECT nb.geom FROM nobridges nb
UNION
SELECT b.geom FROM bridges b) AS tb),
--Re-computation of costs for each edges
costs AS (SELECT m.id, sum(ST_Length(w.geom::geography)/1000)/
avg(w.kmh::double precision) AS cost, m.geom
FROM ways w, merged m
WHERE ST_Contains(m.geom, w.geom)
GROUP BY m.id, m.geom),
--Creation of nodes and adding an ID
nodes AS (SELECT (ROW_NUMBER() OVER ())::int as id, nd.geom
FROM (SELECT DISTINCT geom
FROM (SELECT ST_StartPoint(m.geom) as geom
FROM merged m
UNION
(SELECT ST_EndPoint(m.geom) as geom
FROM merged m)) AS node_all) AS nd)
--Creation of the final table by assigning a source and a target to each edges, thus, having a functional network system
SELECT c.id, n1.id as source, n2.id as target, c.cost, c.geom
FROM costs c
LEFT JOIN nodes n1
ON ST_Intersects(n1.geom, ST_StartPoint (c.geom))
LEFT JOIN nodes n2
ON ST_Intersects(n2.geom, ST_EndPoint (c.geom))
);
In the end, I found myself reconstructing the graph from scratch but I guess this was a better solution as it was quite fast. I don't if it's properly optimized but it works…
It gives me this :
As it doesn't fix my problem, it still reduce potential shortest way calculation. I'm still open for any suggestions on how to calculate only directly-connected cities
Best Answer
Since no one joined in, and if you don´t mind, I´ll share some thoughts without this being a satisfying answer (I share a work-related intereset in this).
I highly doubt you have much of an alternative to get a generalized network like the one you are looking for; this is due to the fact that the adjacency matrix of neighbouring cities you imply (to be able to, sort of, avoid unnecessary calculations) is based on a shortest path calculation as well, and for it to be the shortest path connecting each pair, well, by definition you will each have to traverse the whole graph to find it (or use a heuristic algorithm like A* maybe with its pros and cons).
In your example, as Evan said, there is no way to determine that an existing path {1, 2} is part of the shortest path of [1, 3] before walking all (dijkstra) or lots of (A*) other paths. Also, you shouldn't simply assume that if you reach a city with an existing path to the target of the running search, this is the shortest path and skip the search (imagine a triangle 'network': if you already have the path {B, C} and reach B in your search for [A, C], you shouldn`t skip and take {A, B, C} due to your assumption...).
With this said, I think you could at least try and speed up the 'all-to-all' search in some ways.
While the pgr_dijkstra(many-to-many) function is certainly a lot more sophisticated than what I could propose, writing your own function to do an 'all-to-all' search with pairwise one-to-one-dijkstra could give you the benefit of control:
This should 'easily' be doable in PL/PgSQL (if you are familiar), maybe even by adjusting the pgr_dijkstra(many-to-many) function itself to your needs (although unlikely, due to external C dependencies or general inaccessibility...I haven´t checked).
You could as well switch to Python scripting with one of the many dijkstra implementations (I googled it and came up with heaps...I hope linking them here/now is not necessary), but you´d at least lose the benefits of spatial indexes (the whole control advantage is even more present then, though, since you can easily implement some break conditions in the main algorithm to skip the search whenever you want...if you really want).
As I said, I´m on it, for my own project. If I find something, I´ll update my answer. Maybe someone joins in with an ingenius (or obvious...) solution...