I have some parcel data where I need to extract a subdivision name from a long string. The format is always "Subdivision: ____ ______ _____" etc. BUT, there is no uniformity to what comes before or after "Subdivision: " or the actual name of the subdivision. In my example below it shows that "Block: " follows "Subdivision: " but that's not always the case.
I'd like to learn how to solve this issue using python, but VB can also be used. I was reading about re (edit: Regex) in python, but without some further explanation I'm a little lost. Here is a screen shot showing what the data looks like.
Any tips on where I should try to go with this?
Best Answer
For a more general approach, you could use a regex like
r'\s*(\w+):\s*'
in there.split()
function to build adict
of parcel "keys" and "values" (not sure of your parcel terminology).This regex looks for:
\s*
- zero or more whitespace characters(\w+)
- one or more alphanumeric (a-Z, 0-9, but not other characters), note that the()
brackets indicate a capture group:
- followed by a colon\s*
- followed by zero or more whitespace charactersThe
re.split
function returns a list of each section of text between the matches, but because because we've used brackets to specify a capture group, those captured groups are returned as well.For example:
You can turn that into a field calculator expression, something like:
Code block / Pre-logic Script Code
Expression