[GIS] How to transform values using a dictionary or lookup table

arcgis-10.1arcpypython

I'm reading rows (~20 fields per row) from a database using a SearchCursor and placing them into an array. Then, I loop over the array and use an InsertCursor to insert them into a Feature Class in a different database.

The problem, I need to transform field values in the source data. For example, a column may contain the strings "T", "true", "Yes", and "1" and they must be converted to a string value of "TRUE" before being written to the destination column.

  • There may be multiple values in a source column that need to be mapped to a single value in the destination. This is the example above.
  • There may be multiple lookups per column. For example, one column may have as source value of "A" that gets transformed to "Z1" and in the same column, "B" gets transformed to "Z2", and still in the same column, "C" gets transformed to "Z1" (multiple source values mapped to same destination value).
  • There are many columns that will need lookups created. I'd prefer to have a different dictionary / lookup table for each column as there will likely be subtle differences between columns and trying to reuse dictionaries will get frustrating.

How should I approach this problem?


Update #1

Sample using suggestions by @mr.adam: Throws an error on the line if row[key].lower() in lookup(key[1]): with the message TypeError: int object is not subscriptable

lookup = {
    3: ("TRUE", ["t", "true", "1", "yes"]),
    4: ("FALSE", ["f", "false", "0", "no"])
     }

rows = []
rows.append(("abc", "123", "xyz", "True", "F"))
rows.append(("lmo", "456", "xyz", "1", "no"))
rows.append(("tuv", "456", "xyz", "yes", "0"))

for row in rows:
    for key, value in lookup.iteritems():
        if row[key].lower() in lookup(key[1]):
            row[key] = lookup(key[0])

for row in rows:
    print row

Update #2

after some additional digging, breaking down the above referenced line, row[key].lower() evaluates to "true" as expected for column 4 of the first row in the dataset. The error is thrown when evaluating the in clause of that line, lookup(key[1]).

Best Answer

I would make a dictionary that looks something like this:

lookups = {
    "TRUE":["t", "true", "1", "yes"],
    "FALSE":["f", "false", "0", "no"]
     }

Then, the logic could look like:

for row in rows:
    for i in range(0,len(row)):
        for key, value in lookup.iteritems():
            if str(row[i]).lower() in value:
                row[i] = key

That code will update the entire table at once, row by row. This would be a problem if you have field1 where the value "T" should be translated to "TRUE" and field2 where "T" should be translated to "Top". To if that is the case, you could modify the dictionary to:

lookups = {
    fieldindexnumber: ("TRUE",["t", "true", "1", "yes"]),
    fieldindexnumber: ("FALSE",["f", "false", "0", "no"]),
    fieldindexnumber: ("Unknown",["u"]), # make sure the second object in the tuple is a list []
     }

Then just change the looping structure to:

for row in rows:
    for key, value in lookup.iteritems():
        if row[key].lower() in value[1]:
            row[key] = value[0]

Note that I made all of the potential values lowercase and then cast the existing value to lowercase. I've found that to be very helpful a lot of times, but it may not be what you're looking for.

You may want to implement it in a different way, but I would definitely recommend going with some version of this dictionary, because you can just store it at the top of your script and it will be clearly laid out in case you want to change/add key-value pairs.

Related Question