[GIS] Deleting duplicated features with ArcPy

arcgis-10.3arcpycursordelete

I try to use this python code in order to delete duplicated feature which as the same xy coordinate of centroid (i created those fields with Calculated Geometry in the attribute table):

import arcpy
'''
first we create x center coordinate field (the same for y) in the attribute
table manually, then we will run this code.
'''
list1 = []
listToDeleteX = []
listToDeleteY = []
fc = r"G:\desktop\Project\lyr\polygon.shp"
# check the x coordinate
with arcpy.da.UpdateCursor(fc, "xCenter") as cursor:
    for row in cursor:
        list1.append(row)
        if list1.count(row)>1:
            listToDeleteX.append(row)

 # check the y coordinate
with arcpy.da.UpdateCursor(fc, "yCenter") as cursor:
    for row in cursor:
        list1.append(row)
        if list1.count(row)>1:
            listToDeleteY.append(row)

listToDeleteY.append(listToDeleteX)

in the end of the code i added the x list to the y list but i don't know how to delete the duplicated rows.

I work with arcGIS for Desktop so i don't have any extensions and i can't use the "arcpy.DeleteIdentical_management" tool.

This is the attribute table of the polygon layer:

enter image description here

Best Answer

This code will work on a table and searches a numeric field called test to find and delete duplicates. It assumes the first instance of a duplicate value is the one you want to keep.

import arcpy

def main():
    dict = {} # dictionary, key is test value, item is objectID
    tbl = r"C:\Scratch\fGDB_AIS_Cleaned.gdb\test" # Table to test

    # Search table adding only the first occurance of a value and it's objectID
    print "reading dataset..."
    with arcpy.da.SearchCursor(tbl,["OBJECTID","test"]) as cursor:
        for row in cursor:
            objID = row[0]
            val = row[1]
            if dict.has_key(val) == False:
                dict[val] = objID

    # Get a list of objectIDs to keep
    oList = dict.values()

    # Check duplicates if they exist
    n = int(arcpy.GetCount_management(tbl).getOutput(0))
    if n > len(oList):
        print "deleting duplicates..."
        # create a sql expression on ObjectID
        sql = "OBJECTID NOT IN (" + str(oList) + ")"
        sql = sql.replace("[","")
        sql = sql.replace("]","")

        # Delete duplicates
        arcpy.MakeTableView_management(tbl,"tocleanup")
        arcpy.SelectLayerByAttribute_management("tocleanup","NEW_SELECTION",sql)
        arcpy.DeleteRows_management("tocleanup")
        print "deleted duplicate rows!"
        arcpy.Delete_management("tocleanup")

if __name__ == '__main__':
    main()
    print "finished!"

One thing to consider, I have found deleting from very large datasets (e.g. millions of rows) can be very slow. It is much quicker to copy out rows you want to keep into a new dataset.

Related Question