[GIS] How to return individual address components (city, state, etc.) from GeoPy geocoder

address-parsinggeocodinggooglepython

I'm using GeoPy to geocode addresses to lat,lng. I would also like to extract the itemized address components (street, city, state, zip) for each address.

GeoPy returns a string with the address — but I can't find a reliable way to separate each component. For example:

{street: '123 Main Street', city: 'Los Angeles', state: 'CA', zip: 90034, country: 'USA'}

The Google geocoding API does return these individual components…
is there a way to get these from GeoPy? (or a different geocoding tool?)

Best Answer

Lubar, I saw your post at Stack Overflow but am going to post a similar answer here for consistency. It's a good question. I work in the address verification industry and have tackled your kind of problem before.

I linked to this Stack Overflow question in a comment; and it's important to know that there's really no guarantee about the format of complete freeform street addresses. As mentioned in the linked post, complete addresses can look like any of these:

1) 102 main street Anytown, state

2) 400n 600e #2, 52173

3) p.o. #104 60203

4) 1234 LKSDFJlkjsdflkjsdljf #asdf 12345

5) 205 1105 14 90210

(The reasons are explained in the linked post.) I realize that GeoPy returns addresses in a certain format -- depending on the geocoder used (which resulting format is out of GeoPy's control), but addresses can look all sorts of ways within a certain component (like having commas), and it's important to know that standardized addresses don't have commas (according to USPS Publication 28).

I helped work on an API just recently called US Street Address API from SmartyStreets; it was just upgraded to support geocoding and single-line address parsing.

GeoPy is designed to geocode, not parse into components (that task is actually really difficult for reasons I won't get into here). The US Street Address API will, however, componentize the address and return coordinates and other information about the address, and only if the addresses are real; no "guessed" results.

To parse a single-line address into components using Python, simply put the entire address into the "street" field:

import json
import pprint
import urllib

LOCATION = 'https://api.smartystreets.com/street-address/'
QUERY_STRING = urllib.urlencode({ # entire query sting must be URL-Encoded
    'auth-token': r'YOUR_API_KEY_HERE',
    'street': '1 infinite loop cupertino ca 95014'
})
URL = LOCATION + '?' + QUERY_STRING

response = urllib.urlopen(URL).read()
structure = json.loads(response)
pprint.pprint(structure)

The resulting JSON object will contain a components object which will look something like this:

"components": {
        "primary_number": "1",
        "street_name": "Infinite",
        "street_suffix": "Loop",
        "city_name": "Cupertino",
        "state_abbreviation": "CA",
        "zipcode": "95014",
        "plus4_code": "2083",
        "delivery_point": "01",
        "delivery_point_check_digit": "7"
}

The response will also include the combined first_line and delivery_line_2 so you don't have to manually concatenate those if you need them.

Related Question