ArcPy – Finding and Marking Identical Features

arcpyfield-calculator

I like to find identical entries in a table and mark them with an entry in a new field. I do it with Calculate Field, but so far I always find only the first duplication of each record. However, I would like that if a record is duplicated, all of them are marked with a 1.

My idea is:

Function ident(x) searches for all duplicates and adds them to a list dublicateList. Function findFirst(x) uses this list to mark all duplicate records.

So far only every second record is marked, but not record 1/2.

Example Table

in_table = "test"
new_field = "Identical"
expression = "ident(!Attribut!)+findFirst(!Attribut!)" #in Code trible quotes here
codeblock = " #in Code trible quotes here
uniqueList = []
dublicateList = []
def ident(x):
    if x in uniqueList:
        dublicateList.append(x)
        return 0
    else:
        uniqueList.append(x)
        return 0
        
def findFirst(x):
      if x in dublicateList:
          return 1
      else:
          return 0
"#in Code trible quotes here
arcpy.management.CalculateField(in_table,new_field,expression, "PYTHON3",codeblock, "TEXT", "NO_ENFORCE_DOMAINS")

Best Answer

I think the problem is with your expression, by chaining the two functions like that you are actually creating an equation. I have to say that's the first time I've ever seen anyone attempt to write an expression like that and not surprised it failed. I'm sure some python purist will tell me otherwise but its certainly not a coding style I've seen on any GIS forum or in official help files.

Below is some code you can run in the python console that will identify the values correctly. You just need to ensure tableName and the field names are correct for you.

import arcpy
import collections

tableName = "testTable"
myList = list()

# Read attribute values into counter collection
with arcpy.da.SearchCursor(tableName,["Attribute"]) as cursor:
    for row in cursor:
        myList.append(row[0])
myCounter = collections.Counter(myList)
print(myCounter)

# Write counter dictionary back to table, if count is 2 or greater it is a duplicate
with arcpy.da.UpdateCursor(tableName,["Attribute","Identical"]) as cursor:
    for row in cursor:
        attVal = row[0]
        count = myCounter[attVal]
        if count >= 2:
            row[1] = 1 # Duplicate, you could swap out 1 with the variable count to store the number of times it was duplicated
        else:
            row[1] = 0 # Occurs only once, not a duplicate

        cursor.updateRow(row)

print("FINISHED!")
Related Question