[GIS] Overpass / Overpy: Getting Way IDs from Nodes

openstreetmapoverpass-apipython

I am struggling to grok Overpass as well as I'd like, so I hope someone might be able to help.

I have a route which I'm importing from a GPX file. This has been 'snapped' to OSM data, so I have the lat/lon of each point on the route. Each of these coincides with an OSM node, but I don't have the OSM node ID in my raw lat/lon data.

I also want to find which of these points are what I'm terming 'intersections', i.e. where the route passes from one OSM way to another. In order to do this, my plan was to go from nodes to (highway) ways and work out where the route changed way, perhaps with some refinements to e.g. take the name of the way into account.

So, using Overpy, I have found that the fastest way to go from the route points to OSM nodes is to use a union of multiple bounding-box queries:

TOL = 0.00001

query = """
<osm-script>
  <union>
"""
for index, row in df_snapped.iterrows():
    query += '    <bbox-query s="{}" w="{}" n="{}" e="{}"/>\n'.format(
         round(row.lat-TOL, 5)
        ,round(row.lon-TOL, 5)
        ,round(row.lat+TOL, 5)
        ,round(row.lon+TOL, 5)
    )

query += """
  </union>
  <union into="_">
    <item from="_" into="_"/>
    <recurse from="_" into="_" type="node-way"/>
  </union>
  <print e="" from="_" geometry="skeleton" ids="yes" limit="" mode="body" n="" order="id" s="" w=""/>
</osm-script>
"""

This results in a query similar to the following:

<osm-script>
  <union>
    <bbox-query s="51.48825" w="-2.62352" n="51.48827" e="-2.6235"/>
    <bbox-query s="51.48801" w="-2.62364" n="51.48803" e="-2.62362"/>
    <bbox-query s="51.4878" w="-2.62373" n="51.48782" e="-2.62371"/>
    <bbox-query s="51.48697" w="-2.62406" n="51.48699" e="-2.62404"/>
    <bbox-query s="51.48682" w="-2.62414" n="51.48684" e="-2.62412"/>
    <bbox-query s="51.4868" w="-2.62416" n="51.48682" e="-2.62414"/>
    <bbox-query s="51.48665" w="-2.62431" n="51.48667" e="-2.62429"/>
    <bbox-query s="51.48654" w="-2.62442" n="51.48656" e="-2.6244"/>
    <bbox-query s="51.48633" w="-2.62463" n="51.48635" e="-2.62461"/>

    ...

  </union>
  <union into="_">
    <item from="_" into="_"/>
    <recurse from="_" into="_" type="node-way"/>
  </union>
  <print e="" from="_" geometry="skeleton" ids="yes" limit="" mode="body" n="" order="id" s="" w=""/>
</osm-script>

The second <union> clause gives me the ways from the nodes. So far, so good.

My plan is then to create a python dictionary containing the node IDs as its keys, and sets of way IDs as values:

node_ways = {}

for idx, way in enumerate(result.ways):
    nodes = way.get_nodes(resolve_missing=True)
    for node in nodes:
        if node.id in node_ways:
            temp = node_ways[node.id]
            temp.add(way.id)
            node_ways[node.id] = temp
        else:
            node_ways[node.id] = set([way.id])

This works fine, but has two problems:

  1. The use of resolve_missing=True takes around 3.5 minutes to resolve all the missing nodes for the ways, and I'd very much like to reduce this time. I have tried removing the resolve_missing=True and putting an exception handler around get_nodes, but that results in a set of nodes which is incomplete.
  2. When I use resolve_missing=True, I get all the nodes on all the ways which intersect with the original set of nodes. This is more nodes than I need. I could reduce this down in the code, but ideally I'd rather get rid of the need to use resolve_missing=True somehow.

So, my question is: can I either adapt the original query to give me the output I want directly, or can I get what I need and avoid using resolve_missing=True by some other method?

Best Answer

A little more digging found me the answer. Executing e.g.

vars(result.get_way(4755884))

gives the internal structure of the way object:

{'_attribute_modifiers': {'changeset': int,
  'timestamp': <function overpy.Element.__init__.<locals>.<lambda>>,
  'uid': int,
  'version': int,
  'visible': <function overpy.Element.__init__.<locals>.<lambda>>},
 '_node_ids': [26229733,
  291529159,
  246189513,
  2682629060,
  291529223,
  3723657411,
  3723657424,
  2018716449,
  291530424,
  2018716450,
  291530803,
  26229737,
  1741942073,
  4100724928,
  4100724934],
 '_result': <overpy.Result at 0x221fc5b17f0>,
 'attributes': {},
 'center_lat': None,
 'center_lon': None,
 'id': 4755884,
 'tags': {'bicycle': 'yes',
  'highway': 'residential',
  'maxspeed': '20 mph',
  'name': 'Reedley Road',
  'postal_code': 'BS9',
  'sidewalk': 'both'}}

I can therefore access the list of raw node IDs using e.g. result.get_way(4755884)._node_ids.