ArcGIS – Extracting Substring from Field using Python Parser of Field Calculator in ArcMap

arcgis-desktopfield-calculatorpython-parser

I have some parcel data where I need to extract a subdivision name from a long string. The format is always "Subdivision: ____ ______ _____" etc. BUT, there is no uniformity to what comes before or after "Subdivision: " or the actual name of the subdivision. In my example below it shows that "Block: " follows "Subdivision: " but that's not always the case.

I'd like to learn how to solve this issue using python, but VB can also be used. I was reading about re (edit: Regex) in python, but without some further explanation I'm a little lost. Here is a screen shot showing what the data looks like.

Any tips on where I should try to go with this?

Legal Description

Best Answer

For a more general approach, you could use a regex like r'\s*(\w+):\s*' in the re.split() function to build a dict of parcel "keys" and "values" (not sure of your parcel terminology).

This regex looks for:

  • \s* - zero or more whitespace characters
  • (\w+) - one or more alphanumeric (a-Z, 0-9, but not other characters), note that the () brackets indicate a capture group
  • : - followed by a colon
  • \s* - followed by zero or more whitespace characters

The re.split function returns a list of each section of text between the matches, but because because we've used brackets to specify a capture group, those captured groups are returned as well.

For example:

import re

parcel_text = 'Section: 3 Township: 8 Range: 88 Subdivision: Blah blah blah Block: G Lot: 9A'

print(re.split(r'\s*(\w+):\s*', parcel_text))
['', 'Section', '3', 'Township', '8', 'Range', '88', 'Subdivision', 'Blah blah blah', 'Block', 'G', 'Lot', '9A']

parcel_list = re.split(r'\s*(\w+):\s*', parcel_text)[1:]  # Strip the first element as it's an empty string for some reason
parcel_dict = dict(zip(parcel_list[0::2], parcel_list[1::2]))  
# [0::2] = makes a list of every 2nd element starting from 0, [1::] is the same except starting from 1
# zip "zips" those 2 lists together into a list of 2 element lists, i.e [['Section', '3'], ['Township', '8'], etc...]

print(parcel_dict)

{'Section': '3',
 'Township': '8',
 'Range': '88',
 'Subdivision': 'Blah blah blah',
 'Block': 'G',
 'Lot': '9A'}

You can turn that into a field calculator expression, something like:

Code block / Pre-logic Script Code

import re

def parse_parcel(parcel_text):
    parcel_list = re.split(r'\s*(\w+):\s*', parcel_text)[1:]
    parcel_dict = dict(zip(parcel_list[0::2], parcel_list[1::2]))
    return parcel_dict

Expression

parse_parcel (!your_parcel_field!).get('Subdivision')  #.get avoids a KeyError if there's no "Subdivision"
Related Question