[GIS] building dictionary from feature class field values, how can I ensure there are no duplicate values per key

arcpycursor

I found this code sample that builds a dictionary from two fields; the end result is a concatenation of all values per key to a new field. It's a great tool except that it doesn't account for duplicate values for each key. I've tried to modify the code to check for the value in the dictionary and if it it's not there to add it with the first dictionary[id] = value statement, but it still retains the duplicates. When I perfom the same check with the second dictionary[id] = value statement, it only ends up writing the last value if finds to the field. What am I missing? I'm also not sure of what the purpose is of the "A"s in the SearchCursor parameters.The portion of the code I need to change is below:

# Insert Search cursor on a feature class or table to iterate through row objects and extract field values.
# Sort values of a Search Cursor based on the CaseField and ReadFromField in ascending order.
# Define what will happen once the curser moves through each row.
# While it is in each row it will get the value of CaseField field that you are using as id to iterate.
# While it is in row it will also get the value of the ReadFromField field that you want to concatenate.
# Set the value of the dictionary to the values read by the cursor from the ReadFromField.
# Set an if condition for what should the cursor do when it reads through fields with same ID or the CaseField value.
# In if condition set the new value to last value of the ReadFromField + the defined delimiter + the new value that is read.
# Again set the dictionary value to this new value.
# Set the loop to have the lastid to the id that you got from getValue before it goes through the seconnd loop and so on...
# Set the loops last value variable to the last value that was read such that it starts with that last value for the second loop and so on...

cur1 = arcpy.SearchCursor(InputTable, "", "", "", CaseField +" A;" + ReadFromField +" A")

    for row in cur1:
        id = row.getValue(CaseField)
        #if value not in dictionary:  
        value = row.getValue(ReadFromField)
        dictionary[id] = value
        if id == lastid:
            value = str(lastvalue) + Delimiter + str(value)
            #if value not in dictionary:
            dictionary[id] = value
        lastid = id
        lastvalue = value

UPDATE:
I tried to implement the defaultdict solution, but got an error when trying to update the field with the conctenated values. Here's the updated code, and the error I got is below that:

# Insert Search cursor on a feature class or table to iterate through row objects and extract field values.
# Sort values of a Search Cursor based on the CaseField and ReadFromField in ascending order.
# Define what will happen once the curser moves through each row.
# While it is in each row it will get the value of CaseField field that you are using as id to iterate.
# While it is in row it will also get the value of the ReadFromField field that you want to concatenate.
# Set the value of the dictionary to the values read by the cursor from the ReadFromField.
# Set an if condition for what should the cursor do when it reads through fields with same ID or the CaseField value.
# In if condition set the new value to last value of the ReadFromField + the defined delimiter + the new value that is read.
# Again set the dictionary value to this new value.
# Set the loop to have the lastid to the id that you got from getValue before it goes through the seconnd loop and so on...
# Set the loops last value variable to the last value that was read such that it starts with that last value for the second loop and so on...


cur1 = arcpy.SearchCursor(InputTable, "", "", "", CaseField +" A;" + ReadFromField +" A")

for row in cur1:
    id = row.getValue(CaseField)
    value = row.getValue(ReadFromField)
    dictionary[id] = value        
    if id == lastid:
        value = str(lastvalue) + Delimiter + str(value)
        dictionary[id] = value
    lastid = id
    lastvalue = value


# Delete cursor and row objects to remove the lock on the data that will remain until either the
# script completes or the cursor object is deleted. 
del cur1, row

    # Insert Update cursor to update or delete rows on the specified feature class, shapefile, or table. 
    # Define what will happen once the curser moves through each row.
    # While you are in each row set the cursor to get the value of the CaseField that is used as Id to iterate.
    # Set the value of the field that the concatenated values should be written to with the dictionary values that you concatenated in the code above.
    # Set the cursor to update the row values with the dictionary values.

    cur2 = arcpy.UpdateCursor(InputTable) 
    for row in cur2:
        id = row.getValue(CaseField)
        row.setValue(CopyToField, dictionary[id])
        cur2.updateRow(row)

Error Info:
: Row: Invalid input value for SetValue

Best Answer

Before you add a key, do a check with the existing dictionary to see if the proposed value is already in there.

If it is, skip it, but if it isn't then you will add it.

This POST describes checking keys in python.

Basically, you would be doing something like:

if newValue in myDictionary:
  #do nothing
else:
  #add to the dictionary

Also see the documentation on dictionaries for more information.

For information about the 'A' parameters, it looks like a sorting option for the rows. Read ESRI's documentation to customize your SearchCursor more.

Related Question