[GIS] Nearest facility with pgRouting

attribute-joinspgroutingpostgispostgresql

I have a PostgreSQL table of students with lat, lon and geom, and a table of schools with the same, and a table of roads that has been converted to nodes.

I'm trying to find the driving distance to the nearest school for each student. But I'm new to PostgreSQL and pgRouting, but I'm not seeing a great way to structure the multiple joins to work with the pgr_drivingDistance function.

Ideally I would end up with a new table that would have the student id, the id of the closest school via driving distance, and the distance.

Any thoughts on how to approach this would be great.

Update addressing Jendrusk's questions:

  1. What do you mean by converting roads to nodes? Maybe you meant network graph?

select pgr_createTopology('roads', 0.0001, 'geom', 'id')

  1. Are you able to find at least one way and the trouble is with many ways?

The challenge is finding the shortest way for multiple students. I could take the lat/lon for one student, find the nearest node and then calculate distances that way, but I need to do it for hundreds.

  1. Nearest school euclidean or by the roads?

Driving distance.

  1. What was the work-flow of this conversion form roads to nodes?

See above.

  1. What are the names of tables and column names in this tables?

Three tables: Kids, roads, schools. Kids and schools have a GID, lat, lon, geom, and a bunch of other non-spatial info. Roads has GID, geom, source, target and length.

  1. How big are your tables with students and schools?

100k students, several dozen schools.

  1. Is it OK to select nearest school euclidean

No, interested in driving distance.

Also function st_DrivingDistance is not finding route, but it's creating set of edges reachable in specified time (e.g. how far you can go in 15 minutes).

Also 2 you need distance from student to nearest school but which is nearest if you don't know the distance? Do you have some kind of relation between schools and students? If not you have to count distance from one student to every school to know which is nearest. In most cases nearest euclidean (in straight line) will be also nearest by the roads, but if you have some special cases in your graph such as river with one bridge you can't be sure.<<<

Again, I do not need to find the route. I need to identify the closest school based on driving distance. My impression is that using pgr_drivingDistance it is possible to create a table containing driving distance length from a specified node to all other nodes in a network. You can scope this operation to nodes within a particular distance of the index node, and obviously, in my case, only consider nodes that are the closest node to a school.

I'm basing this info on the following post: http://anitagraser.com/2011/05/13/catchment-areas-with-pgrouting-driving_distance/

I believe pgr_drivingDistance can limit the query based on a number of different costs, including driving time and distance, depending on how the cost is specified both in the query and in the data.

Best Answer

Firstly few questions:
1. what do you mean by converting roads to nodes? Maybe you meant network graph?
2. Are you able to find at least one way and the trouble is with many ways?
3. Nearest school euclidean or by the roads?
4. What was the work-flow of this conversion form roads to nodes?
5. What are the names of tables and column names in this tables?
6. How big are your tables with students and schools?
7. Is it OK to select nearest school euclidean

Please add information to your question so maybe there will be simplest to help you.

But back to the point:

PG_Routing is not a resolution out of the box since it provides a kind of 'low level' stored procedures. You can't write query like 'select pg_CountMeTheWay(student, scholl), but it can find shortest way if you'll give it source edge id, target edge id and set of edges as graph with cost.

Also function st_DrivingDistance is not finding route, but it's creating set of edges reachable in specified time (e.g. how far you can go in 15 minutes).

Also 2 you need distance from student to nearest school but which is nearest if you don't know the distance? Do you have some kind of relation between schools and students? If not you have to count distance from one student to every school to know which is nearest. In most cases nearest euclidean (in straight line) will be also nearest by the roads, but if you have some special cases in your graph such as river with one bridge you cant be sure.

There are many resolutions of your problem even without this questions to your question, so please try to be more specific.

EDIT

Ok, so hope you have a lot of time :)

Firstly - to know the driving distance you have to (one way or another) find shortest path - there is no smoke without fire...
1. Driving distance is not doing anything different then this, but under another conditions. It's like space-time in military meaning - you have to dig ditch from here to everywhere till 4:00pm.
2. Also as far as I know driving_distance is working like window functions (e.g. row_number() over()) so it's designating cost during calculations, so you can't make him to do calculations for all edges, but designate cost only to few. Of course you can do select from select but this means that distance to every edge will be calculated anyway.
3. pgr_Driving_distance is much more slower then pgr_dijkstra - finding one route between points spaced about 15km is taking on my dataset (all Poland) about 1s (0,5-2,5), driving distance with the same condition will be about 15s

Basing upon this times counting:
1. Shortest paths between all students and scholls (100000x50x2) will last 115 days
2. Shortest paths between all students and scholls with minimal time will last about 46 days
3. Mixed version - Shortest paths between all students and 10 euclidean nearest to them scholls will last about 10 days
4. driving_distance for all students till 15km will last 17 days
5. Short version - Shortest paths between all students and 1 euclidean nearest to them scholl will last about 1 day.
6. Performance depends on numer of rows in edges table (network graph) so less records less time you'll need.

Possible resolution (1):

  1. Write stored procedure with your pgr_djikstra() with id_source, id_target as input and cost as output (there are many exapmples on network)
  2. create table with id_student, id_source, id_school, id_target, cost
  3. Fill it with Id's and nearest edges of network graph
  4. update it with cost of shortest path using your function
  5. select for every student record with the lowest cost

Possible resolution (2):

  1. Write stored procedure with driving_distance with source_id as input and retorn output of driving_distance
  2. Write stored procedure with geometry as input and nearest edge id as output.
  3. create tables with students and their nearest edges using above procedure, the same for schools
  4. create table a as select *, your_proc(nearest_edge_id) as dist from students_nearest.
  5. create another table b as select * from a where exists id of nearest student and school id
  6. Select from this table records with student_id, school_id, min(cost) group by student_id, school_id

IMO the best resolution as compromise of time and quality is to find 3 euclidean nearest schools for every student and count for them shortest paths, then choose one - the shortest. With some tuning and only absolutely necessary edges in network graph you can feet in few hours.

Related Question