[GIS] How to geocode to a shape instead of a coordinate

administrative-boundariesgeocodingnominatimopenstreetmapoverpass-api

Geocoding to a shape instead of a point

The project I'm currently working on is Find-A-Record. We are geocoding genealogical record collections and storing them in a spatial index (browse our blog if you want to know more). Searches are based on a shape. We return collections which intersect or are contained within the search area.

During the early states of development, we used geonames to geocode collections to a point. This works well for collections which are associated with lower administrative levels such as cities, towns, and villages. However it really breaks down when you get the county, state, and country level.

The 1940 US Census is associated with the United States and would be assigned a point in Northern Kansas. Any queries within the US that aren't near that point won't return the 1940 US Census.

To solve this we need to geocode collections with a shape instead of a point.

OSM

OpenStreetMaps has the data we need, but it's extremely difficult to extract. The administrative hierarchy is not explicitly stored. Nominatim is used to solve this problem for OSM a Nominatim search only returns features. So a query for Knighton on Teme returns two bus stops but not the administrative boundary relation.

The Overpass API looked promising but it can't do fuzzy string matches. Overpass can only do exact or regex matches. We could use Overpass if there was an easy way to standardize place names. In other words, if OSM provided a way for us to standardize "Knighton on Teme, Worcestershire, England" to "Knighton on Teme CP, Malvern Hills, Worcestershire, West Midlands, England, United Kingdom" according to the OSM hierarchy then fuzzy string matching wouldn't be necessary.

Summary

What we need is a service which allows us to perform fuzzy string searches for a place (or administrative level) and retrieve it's boundaries.

We recognize that it will be difficult to obtain boundary data for the entire world. Thankfully we probably won't need to anytime soon. We only need data for areas of the world where genealogical records exist and genealogists do research.

It's looking like we will need to build our own service which indexes OSM in such a way that enables us to query for administrative boundaries. But we would really prefer not to. Is there any other way we can retrieve this data with existing services?

Best Answer

Cool project! You might take a look at MapIt: Global:

MapIt is a service that maps geographical points to administrative areas. This edition is based on source data from the totally amazing OpenStreetMap project, so add your boundaries there if they’re missing. If you’re in the UK our MapIt UK with open Ordnance Survey data will probably be more useful.

MapIt is useful for anyone who has the co-ordinates of a point on Earth, and who needs to find out what country, region, city, constituency, or state it lies within. It’s also great for looking up the shapes of all those boundaries.

Charitable, low volume use of this service is free – read more.

You can download the source on Github.

Need a licence? Read more or get in touch (commercial@mysociety.org).

Related Question