[GIS] the best way to parse ESRI geocoding results

arcgis-onlinegeocoding

My project uses one of ESRI's ArcGIS online geocoding services to provide address searching capability.

Specifically I use this locator (only through the premium subscription model):
http://tasks.arcgisonline.com/ArcGIS/rest/services/Locators/TA_Address_NA_10/GeocodeServer

Now, what I want to do is take the address candidates from the result set (when there are more than one, which is often) and choose the closest, best matching candidate for the user and present that to them rather than have them choose from a list of candidates. Take this simple example:

Address:

50 22nd St, National City, CA

Returns the results, with the address, score and locator name for each:

--------------------------------------------------------------------
50 E 22nd St, National City, CA, 91950     – 90.97    – US_Streets
50 W 22nd St, National City, CA, 91950     - 90.97    – US_Streets
National City, CA                          – 100.0    – US_CityState
City Terrace, CA                           – 94.35    - US_CityState

For this example the best thing to do is present the user with a choice between 50 E/W 22nd St. Right? I say that because taking National City, CA as the right choice would be inaccurate, even thought it has score of 100.

Another example:

1700 Alondra Blvd, Compton, CA

Returns these results:

---------------------------------------------------------------------
1700 W Alondra Blvd, Compton, CA 90220     – 90.97    – US_RoofTop
1700 E Alondra Blvd, Compton, CA 90221     – 90.97    - US_Streets
1700 W Alondra Blvd, Compton, CA 90220     – 90.97    – US_Streets
Alondra Blvd, Compton, CA, 90746           - 100      - US_StreetName
Alondra Blvd, Compton, CA 90220            - 100      - US_StreetName
Alondra Blvd, Compton, CA, 90221           - 100      - US_StreetName
E Alondra Blvd, Compton, CA, 90746         – 88.71    - US_StreetName
W Alondra Blvd, Compton, CA, 90220         – 88.71    - US_StreetName
W Alondra Blvd, Compton, CA, 90220         – 88.71    - US_StreetName
W Alondra Blvd, Compton, CA, 90746         - 88.71    - US_StreetName
E Alondra Blvd, Compton, CA, 90221         – 88.71    - US_StreetName
E Alondra Blvd, Compton, CA, 90220         – 88.71    - US_StreetName
Compton, CA                                – 100.0    - US_CityState
East Compton, CA                           – 95.48    - US_CityState
West Compton, CA                           – 95.48    - US_CityState

Do you return a choice between E/W as the previous example or do you try and make an educated guess and return Alondra Blvd, Compton, CA because US_StreetName is pretty reliable and shouldn't be ignored? My algorithm below will return Alondra Blvd, Compton, CA but perhaps it'd be more consistent to return E/W as a choice. You could also argue that you should just choose for the user and return 1700 W Alondra Blvd as Google does.


Here is the algorithm that I use to determine the best address, most of the time.

The Algorithm

  1. Ahead of time make a list of your preferred locators and determine a minimum match score for each. You are to prefer the results from these locators over any others. Also, make an all-time minimum match score to filter all results by, as your fall back option. This is so that you avoid, really poorly geocoded results.
  2. Iterate over your preferred locators one a time. Look at just those candidates whose locator is the locator your interested in during each iteration.
  3. For each candidate compare the score with the current preferred locator's score to see if it meets or exceeds the locator's minimum match score.
  4. If the candidate's score meets or exceeds the minimum match score for that candidate take that one candidate as your best match and return it to the user.
  5. If the candidate's score does not meet or exceed the minimum match score, keep looking through the candidates.
  6. If you haven't found a candidate whose score meets or exceeds the minimum match score for the current locator, keep looking through the locators for a match.

    // Usually a candidate is found in these first steps, but if not ….

  7. Take your all-time minimum match score and iterate over your candidates removing any candidates whose score is below this all-time minimum match score.

  8. Order this list of candidates by their score and return them all to the user to choose from.

What is the best way to do this?
How can I improve my algorithm?

Thanks!

Best Answer

Consider the input address here:

1700 Alondra Blvd, Compton, CA

Let's take a look at the address components that were entered. (In this simple case, an address component is surrounded by spaces or a comma. Cities will certainly have multiple words in them and streets will also have multiple words in them.):

primary_number: 1700
street_predirection: none
street_name: Alondra
street_suffix: Blvd
street_postdirection: none
secondary_number: none
secondary_designator: none
city_name: Compton
state_abbreviation: CA
zipcode: none
plus4_code: none

You definitely don't want to return an address that has fewer address components than the input address.

With that in mind, I would recommend considering both the US_RoofTop response and also the US_Streets response. In this case, the US_Streets response has two comparable responses, one East and one West. There is no way for you to guess which one is preferred. The US_RoofTop respons is a duplicate of the US_Streets respons (based on the output address string) so it can be removed from what you present to the user.

No ZIP Code was input, that means the user is relying on your service to determine the ZIP Code. This is important because if the input had included a ZIP code, either 90220 or 90221, you would have been able to narrow the response down to just one address.

So, in summary, Take the response(s) that have the greatest number of address components as they are most likely be more accurate, consolidate down to just unique responses, and present those back to the user. You have then been as smart as you possibly but still allow your user to clarify when needed.

expertise: I work with addresses all day long as a street genius at SmartyStreets.

Related Question