Python Geocoding – How to Disambiguate Messy Place Names Locally

geocodingpython

I have list with several million place names that come from Flickr profiles. Users provided these placenames as free text, so they look like this:

Roma, Italy
Kennesaw, USA
Saginaw, MI
Rucker, Missouri, USA
Melbourne, Australia
Madrid, Spain
live in Sarnia / work in London, Canada
Valladolid, España
Italia
West Hollywood, United States

I want to disambiguate these place names. I am aware that there is in some cases no straightforward to this solution, but I am willing to live with some false disambiguation and with "no answer" for some of the places. If a place name corresponds to the name of multiple cities, then I want to assign that place to the largest city that it corresponds to.

Yahoo's place finder api would be a good solution to this problem, but I would need to make too many API calls to get through my list, so I'd like a local solution (i.e., one that does not depend on a remote api). Does anyone know of any python libraries that do this kind of thing, or any other local solutions?

(I've also asked this question on stackoverflow.)

Best Answer

You could try the Python library geodict. This has datasets you can download and import to a database - you can check the lists to see if they'd work well or not with your data. It works in two steps:

  1. Extracting names
  2. Matching names to a location in the lists

More details (and another online option in the comments) here.

Related Question