I had initially neglected to delete my irow
object at the end of the script; after updating my code the insert cursor appears to be adding all rows (even the last one!) the way it should be. Here is the final code:
import arcpy, os
months = {'JAN':1,'FEB':2,'MAR':3,'APR':4,'MAY':5,'JUN':6,'JUL':7,'AUG':8,
'SEP':9,'OCT':10,'NOV':11,'DEC':12}
ignore = ['Arbitrary_count','TOTAL_MGY','JAN','FEB','MAR','APR','MAY','JUN','JUL','AUG','SEP','OCT','NOV','DEC']
ws = r'D:\Data\Users\jbellino\Project\faswam\data\water_use\SC\FromTomAb\SC_WELLS_data_jcb.mdb'
arcpy.env.workspace = ws
arcpy.env.overwriteOutput = True
tbl = 'ORIGINAL_DHEC_WELL DATA'
itbl = 'monthly_dhec_well_data'
fields = arcpy.ListFields(tbl)
rows = arcpy.SearchCursor(os.path.join(ws,tbl))
irows = arcpy.InsertCursor(os.path.join(ws,itbl))
for row in rows:
for month in months:
#--for each row in the original table, and for each month stored in that row,
# create a new record in 'itbl'
irow = irows.newRow()
for field in fields:
if field.name == month:
#--if the field name refers to a month abbreviation it contains data
# first convert the month abbreviation to month number
irow.cn_mo = months[month]
try:
# then grab the data in the field and process it into the appropriate
# fields of the new table in the correct units
irow.cn_qnty_mo_va = row.getValue(field.name)*1000000
irow.cn_qnty_mo_va_mega = row.getValue(field.name)
except:
#--skip null values
pass
elif field.name not in ignore:
#--if the field name is not a month abbreviation, just copy the data
# to the new table
irow.setValue(field.name,row.getValue(field.name))
irows.insertRow(irow)
del irow,irows
del row,rows
Let's say you have a list of shapefiles and the field name for the attribute is the same in each shapefile. That means you can make the SQL query in the beginning of the script, so the first bit could look like this:
search_id = arcpy.GetParameterAsText(0)
shps = [r"path\to\shp1.shp",r"path\to\shp2.shp"]
output_shp = r"path\to\output.shp"
field = "ID_FIELD_NAME"
sql = '"{0}" = \'{1}\''.format(field,search_id)
Here are two ways to go about your selection process. The first uses a where clause in the MakeFeatureLayer() and GetCount(), and the second uses a SearchCursor() to find the attribute and proceeds to MakeFeatureLayer(). I'm not sure which one is faster, but my guess is the second. Most likely, the speed is very similar if you don't have tons of shapefiles.
for shp in shps:
# make feature layer
if arcpy.Exists("fl"):
arcpy.management.Delete("fl")
fl = arcpy.management.MakeFeatureLayer(shp,"fl",sql)
# check count of features in new feature layer, skip to next shp if it == 0
if int(arcpy.management.GetCount(fl).getOutput(0)) == 0:
continue
# copy features, only happens if there is a feature in the feature layer
arcpy.management.CopyFeatures(fl,output_shp)
and here's the second. In this one, a SearchCursor is used to check for the presence of the feature, and then the MakeFeatureLayer() is used. I think this would probably be the faster way:
for shp in shps:
# use list comprehension with a SearchCursor to check for the id
all_ids = [r[0] for r in arcpy.da.SearchCursor(shp,field)]
if not search_id in all_ids:
continue
# now make feature layer and copy features
if arcpy.Exists("fl"):
arcpy.management.Delete("fl")
fl = arcpy.management.MakeFeatureLayer(shp,"fl",sql)
# copy features
arcpy.management.CopyFeatures(fl,output_shp)
EDIT:
With differing field names, you can store them and iterate the shapefiles like this:
shps = [(r"path\to\shp1.shp","ID_FIELD_NAME1"),
(r"path\to\shp2.shp","ID_FIELD_NAME2")]
output_shp = r"path\to\output.shp"
for item in shps:
shp = item[0]
field = item[1]
sql = '"{0}" = \'{1}\''.format(field,search_id)
#continue with the rest of the code as above
Best Answer
I can see several things that may be causing your script to be slow. The thing that is likely being very slow is the
arcpy.CalculateField_management()
function. You should use a cursor, it will by several magnitudes faster. Also, you said you are using ArcGIS Desktop 10.3.1, but you're using the old ArcGIS 10.0 style cursors, which are also much slower.The min() operation even on a a list of 200K will be pretty quick. You can verify this by running this small snippet; it happens in the blink of an eye:
See if this is any faster:
EDIT:
I ran some timing tests and as I suspected, the field calculator took almost twice as long as the new style cursor. Interestingly, the old style cursor was ~3x slower than the field calculator. I created 200,000 random points and used the same field names.
A decorator function was used to time each function (may be some slight overhead in the setup and tear down of functions, so maybe the timeit module would be a little more accurate to test snippets).
Here are the results:
And here is the code I used (broke everything down to individual functions to use the
timeit
decorator):And finally, this is what the actual print out was from my console.
Edit 2: Just posted some updated tests, I found a slight flaw with my
timeit
function.