ArcGIS – How Are Geocoding Scores Calculated in ArcGIS?

arcgis-desktopgeocoding

After a table of addresses is geocoded, ArGIS provides information about each geocoded address, among those the "match score" of the candidate to which the address was matched, which ranges from 0 to 100. According to their documentation "The match score is based on how well the locations found in the reference data match with the address data being searched."

It seems intuitive that 100 means an address with the exact name was found in the Address locator and 0 means no such address was found. However, I could not find any information about how exactly this score is calculated, particularly if values are somewhere between the extremes. I this known?

I found the pointer to this white paper in the answer to this question, but I could not find any information in that paper that would answer the question.

Best Answer

The scores are based on a weighted numbering system; based on the number of matching characters in each of the prioritized/configured address element areas. So the more characters that can match the better the likelihood of a high score.

When using ranged-address data such as street center-lines the address range and parity will also figure into the process. So if you have a range from 3000-6000 even and the address is 2998 but the rest of the streetname match; ArcGIS will make this a candidate but lower the score since the number was outside the expected goal.

D.E.Wright

See Bruce Harold's response at Re: Geocoding Score Documentation: How is the score value determined?:

"Re: Geocoding Score Documentation: How is the score value determined? Bruce Harold Level 5 Bruce Harold Employee Apr 10, 2015 2:25 PM (in response to Nathan Lowry)

Hello

Score calculation is not documented in detail, but I can give you a thumbnail.

If you open USAddress.lot.xml in Firefox from its installed location at file:///C:/Program Files (x86)/ArcGIS/Desktop10./Locators you will see a navigable tree.

In Top Level Elements navigate to FullNormalAddress; the superscript numbers for NormalAddress (70) and Zone (30) are the relative weights for score contributions from those elements. Coincidentally they sum to 100 but only the relative weight is relevant.

Navigating further from NormalAddress you will see 70/100 of the score is contributed 15/75 and 60/75 by House and FullStreetName respectively, where 75 is the sum of the weights, and further down you can see the elements prefix (5/92), pretype (6/92), StName (70/92), suftype (6/92) and suffix (5/92) weights where 92 is the sum of those weights. An individual score for any lowest level element (like how to calculate a score contribution from an imperfect street name) may be determined by the Spelling/Scoring section of the XML file if an anticipated spelling correction is required to match the reference data, or by a proprietary algorithm for unanticipated spelling errors or noise or repeated characters, as when you have keybounce.

Scores are weight summed, with percentage normalization, from the bottom up. Missing elements do not penalize a score, they simply do not contribute.

Related Solutions

[GIS] the best way to parse ESRI geocoding results

Consider the input address here:

1700 Alondra Blvd, Compton, CA

Let's take a look at the address components that were entered. (In this simple case, an address component is surrounded by spaces or a comma. Cities will certainly have multiple words in them and streets will also have multiple words in them.):

primary_number: 1700
street_predirection: none
street_name: Alondra
street_suffix: Blvd
street_postdirection: none
secondary_number: none
secondary_designator: none
city_name: Compton
state_abbreviation: CA
zipcode: none
plus4_code: none

You definitely don't want to return an address that has fewer address components than the input address.

With that in mind, I would recommend considering both the US_RoofTop response and also the US_Streets response. In this case, the US_Streets response has two comparable responses, one East and one West. There is no way for you to guess which one is preferred. The US_RoofTop respons is a duplicate of the US_Streets respons (based on the output address string) so it can be removed from what you present to the user.

No ZIP Code was input, that means the user is relying on your service to determine the ZIP Code. This is important because if the input had included a ZIP code, either 90220 or 90221, you would have been able to narrow the response down to just one address.

So, in summary, Take the response(s) that have the greatest number of address components as they are most likely be more accurate, consolidate down to just unique responses, and present those back to the user. You have then been as smart as you possibly but still allow your user to clarify when needed.

expertise: I work with addresses all day long as a street genius at SmartyStreets.

[GIS] Geocoding – get lat/long from 11000 address

I think you will find numerous answers to similar questions on our site by searching the geocode tag.

A few that stick out are:

Best Answer

Related Solutions

[GIS] the best way to parse ESRI geocoding results

[GIS] Geocoding – get lat/long from 11000 address

Related Question