[GIS] Deleting feature class features faster using ArcPy

arcgis-10.0arcpyenterprise-geodatabasefeature-datasetfeatures

I have written this script that recurses through all feature classes in a given set of feature datasets and deletes all their features. It appears to do the job but it runs pretty slowly. Is there anything i'm doing wrong here or are there any obvious ways to speed things up?

I'm using DeleteFeatures_management to do the deed. DeleteRows_management also seems to work.

import sys
import os
import arcpy
from arcpy import env
import datetime
import getpass

try:
    passwd = getpass.getpass("Enter the sde user password: ")

    sdeConnectionFileDir = os.environ.get("TEMP")
    databaseName = ""
    fileName = "temp.sde"

    # Delete any pre-existing SDE connection file.
    fullPath = sdeConnectionFileDir + '\\' + fileName
    if os.path.exists(fullPath):
        os.remove(fullPath)

    # Create temporary SDE connection file.
    arcpy.CreateArcSDEConnectionFile_management (
        sdeConnectionFileDir, fileName,
        "sdeserver", "5151", "",
        "DATABASE_AUTH", "my_sde_user", passwd,
        "SAVE_USERNAME", "SDE.DEFAULT", "SAVE_VERSION"
    )

    env.workspace = fullPath

    # ArcPy status codes.
    returnCodes = {'WARN' : 0, 'INFO' : 1, 'ERROR' : 2}

    featureDatasets = []
    featureDatasets.extend(arcpy.ListDatasets("dataset1*"))
    featureDatasets.extend(arcpy.ListDatasets("dataset2*"))
    featureDatasets.extend(arcpy.ListDatasets("dataset3*"))

    list = '[%s]' % ', '.join(map(str, featureDatasets))
    response = raw_input("\n***** WARNING!!! ***** \nAll data will be deleted from all feature classess in the following datasets: \n\n" + list + "\n\n |--> Type DELETE to begin removal: ")
    if response == "DELETE":
        print "\nStarted: " + str(datetime.datetime.now()) + "\n"
        for dataset in featureDatasets:
            print "Processing dataset: " + dataset
            for fc in arcpy.ListFeatureClasses("*", "ALL", dataset):
                rowCount = int(arcpy.GetCount_management(fc).getOutput(0))

                if rowCount > 0:
                    print "  -- Processing feature class: " + str(fc) + " (" + str(rowCount) + " rows)"
                    #arcpy.DeleteRows_management(fc)
                    arcpy.DeleteFeatures_management(fc)

        print "\nCompleted: " + str(datetime.datetime.now())

except Exception as e:
    if arcpy:
        arcpyErrors = arcpy.gp.getMessages(returnCodes['ERROR'])
        if arcpyErrors:
            sys.stderr.write(arcpyErrors + "\n")
    sys.stderr.write(str(e) + "\n")
    sys.exit(1)

EDIT

I put some performance timers in the script and here is the data:

  • Time to retrieve datasets: 0:00:01.254000
  • Total Feature Classes: 1682
  • Total Feature Classes with Data: 124
  • Total Features Processed: 190222
  • Total Run Time: 3 hours, 16 minutes

The breakdown:

Feature dataset –> list feature class calls:

* AVG   0:00:02
* MIN   0:00:01
* MAX   0:00:07
* COUNT 40
* TOTAL 0:01:08

Feature count calls (majority of the time):

* AVG   0:00:06
* MIN   0:00:01
* MAX   0:00:16
* COUNT 1682
* TOTAL 2:41:00

Feature deletion calls (reduced because only feature classes with rows are processed):

* AVG   0:00:17
* MIN   0:00:02
* MAX   0:03:22
* COUNT 124
* TOTAL 0:34:31

Best Answer

Which part of the script is actually taking up most of the time? There are about 5 other steps going on before you actually start deleting stuff.

You might want to break your script down into bite-sized tests. For example, instead of creating a temporary connection file, listing a bunch of datasets, listing their contents, counting their records, and then finally actually doing what you want to do (deleting features), just pass in a single feature class with a premade connection file to DeleteFeatures and see how long that takes.

If that performs acceptably then create another test to time the next potential trouble spot: counting rows. And another for listing feature classes within a feature dataset, and yet another for listing feature datasets within a geodatabase.

If, on the other hand, DeleteFeatures does not perform acceptably, then at least we know where the problem is. In that case I would be more inclined to look at how your geodatabase is designed:

  • Are any of your feature datasets versioned? When using versioning there exists an additional pair of A (adds) and D (deletes) tables for each versioned table, and when you delete features you aren't deleting records in the base table, you are adding records to the D table. This will take much longer than if it was not versioned.

  • Since your feature classes all seem to be in feature datasets, do they participate in geodatabase behavior such as a topology or geometric network? When you add/remove/modify features participating in geodatabase behavior there is a lot more overhead.

  • Also note that contrary to popular belief, feature datasets are not designed to be used as an organizational tool:

    Feature data sets exist in the geodatabase to define a scope for a spatial reference. All feature classes that participate in topological relationships with one another (e.g., a geometric network) must have the same spatial reference. Feature data sets are a way to group feature classes with the same spatial reference so that they can participate in topological relationships with each other.

    To most users, feature data sets also have a natural organizational quality, much like a folder on a file system. Since for many GIS applications the majority of the data has the same spatial reference, the temptation to group large numbers of feature classes into feature data sets is irresistible.

    Feature data sets, however, are not free. When you open a feature class contained in a feature data set to look at its properties or draw or query it in ArcCatalog™, ArcMap™, or a custom application, all of the other feature classes in that feature data set are also opened. This is done because updates to a feature class in a feature data set can potentially ripple to other feature classes in the feature data set that participate in topological relationships.

    From: Multiuser Geographic Information Systems with ArcInfo 8 (April 2000)

    So that might be another source of overhead even if they do not participate in a topology or geometric network.

Beyond the arcpy DeleteFeatures/DeleteRows commands:

  • If you have SDE administration command access you could use:

    sdetable -o truncate -t <tablename>

    This issues truncate table commands to your DBMS so it should be much faster, but note that this ignores geodatabase behavior.

  • Using ArcSDESQLExecute to issue TRUNCATE TABLE commands directly (again bypassing geodatabase behavior), but this is very trouble-prone as you would need to issue one for each table that makes up a feature class (base, F, S, I, A, D, etc.) Failing to do this carefully and correctly could leave your data in an inconsistent state.