Below is the code I'm using to replicate the "related tables" button in ArcMap. In ArcMap that button selects features in one feature class or table based on the selection of features in another related feature class or table.
In ArcMap I can use that button to "push" my selection to the related table in a matter of seconds. I was unable to find anything built in to arcpy that replicates the button so I used some nested loops to do the same task.
The code below loops through a table of "treatments". For each treatment, it loops through a list of "trees". When a match is found between the ID fields of treatment and trees, a selection occurs in the tree layer. Once a match is found for a treatment, the code does not continue to search the tree layer for additional matches. It goes back to the treatment table, selects the next treatment and again searches through the tree feature class.
The code itself works fine, but it is agonizingly slow. The "treatment table" in this case has 16,000 records. The "tree" feature class has 60,000 records.
Is there another more efficient way to recreate what ESRI is doing when it pushes the selection from one table to another? Should I be creating an index for the tables? NOTE: This data is stored in an SDE.
# Create search cursor to loop through the treatments
treatments = arcpy.SearchCursor(treatment_tv)
treatment_field = "Facility_ID"
for treatment in treatments:
#Get ID of treatment
treatment_ID = treatment.getValue(treatment_field)
# Create search cursor for looping through the trees
trees = arcpy.SearchCursor(tree_fl)
tree_field = "FACILITYID"
for tree in trees:
# Get FID of tree
tree_FID = tree.getValue(tree_field)
if tree_FID == treatment_FID:
query = "FACILITYID = " + str(tree_FID)
arcpy.SelectLayerByAttribute_management(tree_fl, "REMOVE_FROM_SELECTION", query)
break
Best Answer
First off, yes you will definitely want to make sure your primary and foreign key fields are indexed on both tables. This lets the DBMS plan and execute queries against these fields much more efficiently.
Secondly, you are calling
SelectLayerByAttribute_management
in a tight, nested loop (once per tree per treatment). This is highly inefficient, for several reasons:Instead, refactor your code so that you call
SelectLayerByAttribute_management
just once with a whereclause constructed to select all of the related records.Borrowing a function from another answer for the whereclause construction logic, I'd imagine it would look something like this:
You could call it like so:
selectRelatedRecords(treatment_tv, tree_fl, "Facility_ID", "FACILITYID")
Notes:
This uses an
arcpy.da.SearchCursor
, only available at 10.1. As @PolyGeo mentioned, these cursors are much faster than their predecessors (arcpy.SearchCursor
). It could be easily modified to use the old SearchCursor though:If your SDE geodatabase is on Oracle, be warned that the
IN
statement used in the function from the linked answer is limited to 1000 elements. One possible solution is described in this answer, but you'd have to modify the function to split it into multiple 1000-lengthIN
statements instead of one.