[GIS] How to delete duplicate rows where multiple values are duplicated

arcpy

I have a feature class where there are multiple values duplicated in a row, i.e. SRNumber and RESOLUTION_CODE. I would like to delete all rows where my SRNumber and RESOLUTION_CODE are found in multiple rows. I have got a good start with this script, everything works aside from actually deleting the rows. How can I fix this so that my duplicate rows are deleted?

So if SRNumber = 1-23456 and Resolution_Code = 'A' and appears twice, I would like to delete one.

import arcpy

     in_table = r"ServiceRequest.DBO.SO_Dead_Animal"
     fields = ["NumberCYLA","RESOLUTION_CODE"] #replace with own
     curStrings = arcpy.da.SearchCursor(in_table,fields,"NumberCYLA is not null")

     stringsKeysList = []
     numbersValuesList = []
     tableDict = {}

     for row in curStrings:
         print row[0],row[1]
         stringsKeysList.append(row[0])
     del curStrings
     uniqueStringsList = list(set(stringsKeysList)) #only unique values from Strings field

     for uniqueString in uniqueStringsList:
         emptyTempValues = []
         curValues = arcpy.da.SearchCursor(in_table,fields,"""NumberCYLA = '{0}'""".format(str(uniqueString)))
         for row in curValues:
             print row[1]
             emptyTempValues.append(row[1])
             uniqueEmptyTempValues = list(set(emptyTempValues))
         tableDict[uniqueString] = uniqueEmptyTempValues

     print tableDict #a dictionary with string:values
     #{u'String_2': [-4, -2],
     #u'String_3': [None, -3],
     #u'String_56': [-1],
     #u'String_1': [-1],
     #u'String78': [-1]}

     rowsToDelete = [k for k, v in tableDict.iteritems() if v == [-1]] #find which rows to delete
     print "list: ", rowsToDelete
     stringRows = ","
     updRows = stringRows.join(rowsToDelete)
     updRows = updRows.replace(",","','")
     print updRows #update the string to be used with proper syntax in where clause

     where_clause = """NumberCYLA in ('{0}')""".format(updRows)
     upd_cur = arcpy.da.UpdateCursor(in_table,fields,where_clause)
     for row in upd_cur:
         print row[0],row[1]
         del rowsToDelete

Best Answer

If you're wanting to keep one row of each duplicate, make use of Delete Identical. Use your two fields as your field inputs.

arcpy.DeleteIdentical_management (r"c:\example\example.gdb\example", ["SRNumber", "Resolution_Code"])

If you would like to delete all duplicates, determine the duplicates with a little list comprehension, and then use an update cursor to delete your rows.

from arcpy import *

inFc = r"c:\example\example.gdb\example"

fld1 = "SRNumber"
fld2 = "Resolution_Code"

#List of values as tuples
rows = [(val1, val2) for val1, val2 in da.SearchCursor (inFc, [fld1, fld2])]

#Duplicate tuples
duplicateRows = [row for row in rows if rows.count (row) > 1]

#UpdateCursor to delete duplicates
with da.UpdateCursor (inFc, [fld1, fld2])] as cursor:
    for val1, val2 in cursor:
        tup = (val1, val2)
        if tup in duplicateRows:
            cursor.deleteRow()