[GIS] arcpy taking longer and longer in a loop

arcgis-10.0arcpygeoprocessingmemoryperformance

I am looping through an amount of point arrays, creating data on the fly.

The features/arrays, are largely similar in size. One thing I have noticed, is that when I start the process, the time it takes to perform, rises from about 8 seconds, to around 1 minute. I can't see why this is happening, but you cna see the times slowly increase if you put a timer around the loop.

Is there a way to increase this performance? Is there a flush method? I believe I am properly handling my objects and deleting them after use, but I cannot for the life of me see why it is happening?

Any ideas?

The code is refactored, so there may be some typos, etc.

def createGeom(geom,header,band,scratchDB,buffer, dist):

filetime = (str(time.time())).split(".")
outfile = "fc" + filetime[0]+filetime[1]
outpath =  scratchDB + "tmp.gdb/Polygon/"
print outpath
sCon = ["1", "16", "2.0"] 
ts = header.split(" ")
ti = ts[6]
Current = ts[9]
LL = ts[2][2:5]
LU = ts[2][8:11]

outFeatureAggClass = outpath+outfile+"_agg"
outFeatureClass = outpath+outfile

arcpy.AggregatePoints_cartography(geom, outFeatureAggClass, dist)
arcpy.Buffer_analysis(outFeatureAggClass, outFeatureClass, buffer)



arcpy.AddField_management(outFeatureClass, "Name", "Text")
arcpy.AddField_management(outFeatureClass, "ID", "SHORT")
arcpy.AddField_management(outFeatureClass, "LL", "SHORT")
arcpy.AddField_management(outFeatureClass, "LU", "SHORT")        
arcpy.AddField_management(outFeatureClass, "Tim", "SHORT") 
arcpy.AddField_management(outFeatureClass, "Band", "SHORT")
arcpy.AddField_management(outFeatureClass, "Lvl", "TEXT") 
arcpy.AddField_management(outFeatureClass, "Cur", "SHORT")


rows = arcpy.UpdateCursor(outFeatureClass)
for row in rows:
    row.name = outfile
    row.ID = 1000
    row.LL = int(LL)
    row.LU = int(LU)
    row.Tim = int(ti)   
    row.Band = int(band)
    row.Lvl = sCon[int(band)]
    row.Cur = Current
    rows.updateRow(row)

del row
del rows    

UPDATE

I am now running this in ArcMap, and it's taking significantly less time, to a factor or about 3 or 4; what took 30 mins, now takes 7.

What's that all about?

EDIT

ESRI have agreed to look at the issue. I'll park this for now, ready to update it with any progress.

Best Answer

You might want to try putting del row inside of the loop as the last call after rows.updateRow{row}. Your object deletion does not take place until you are outside of the loop, so it might just be the rather large collection of row objects that you are creating. I would think that the row objects are being deleted as you go out of scope, but that might not be the case.

You can also try using the garbage collection module to debug and/or fix the issue: http://docs.python.org/library/gc.html

Some other issues... do all of your casting outside the loop. e.g. LL = int(LL) before the loop, so that inside the loop you just do 'row.LL = LL' It looks like you are doing no calculations inside the loop that would change any of these values from row to row, so there is no need to do the casting inside the loop.

Edit: I did not realize that you were looping this against many files. The problem might be in your outer loop call, not in your function call. Since it works correctly in ArcMap, but not outside of ArcMap, you might be reloading arcpy or creating new geoprocessor objects with each call to a file. This is a very costly operation that gets much worse the more times you execute it.

I ran into a similar problem with an operation that I was trying to call every 20 seconds outside of ArcMap. Every time that I called the inner function from the outer timing loop, arcpy was being reloaded even though I imported arcpy into the outer timing loop. As a result, my inner function kept running slower and slower and was impossible to finish in 20 seconds. To get around this, I imported arcpy into the outer timing loop, and then passed arcpy as an argument into the inner function. Once I did this, my inner function was executing in under 3 seconds every time.

So, for your function, you might want to try changing its definition to:
def createGeom(geom,header,band,scratchDB,buffer, dist, arcpy): and then when you call it, make sure you pass arcpy as the 7th argument.

As far as I can tell, inside of ArcMap, arcpy is loaded once and then maintained for any calls against it, even if called by an inner function. Hence the much better performance when executed in ArcMap.

Related Question