[GIS] Spatially Joining Many Shapefiles. Use Spatial Join or UpdateCursor

arcgis-10.2arcpycursorspatial-join

I have a large number of tree records in 100s of point shapefiles. Each shapefile contains a number of trees that fall within 10-50 parcels. The parcel layer contains ~60,000 shapes. For each point shapefile, I need to extract some of the features from the parcel layer and add them to the point layer.

I have two workflows in mind, none of which I'm very excited about and would like feedback. Working in ArcMap 10.2 advanced.

Spatial Join:

Do a spatial join for each tree layer with the parcel layer. Since the output will have to be further processed (more fields will be added, and then output to an excel table), I don't like that spatial join creates a new shapefile, and adds fields that I don't need in the final output. I have also found spatial join to be rather clunky.

Using a nested search cursor and update cursor. (See Using Select By Location to update field in feature class using ArcPy?).

Basically, da.SearchCursor(Parcels), SelectLayerByAttribute(Parcels), SelectLayerByLocation(Trees), da.UpdateCursor(Trees), create new fields and populate them with values from the selected parcels.
The problem here seems to be that I'm looping through the entire parcel layer despite knowing that only a few parcels need to be considered.


The use of an in_memory table of only the parcels I needed was what I was missing.

Here's the code I ended up using:

import arcpy, os

#Set environment and define variables
arcpy.env.workspace = r'C:\...\Example'

path = r'C:\...\Example'
outpath = r'C:\...\Example\SurveyTrees.gdb'
parcels = r'C:\...\Example\ParcelProcess.gdb\Parcel'

#Make a list of all the shapefiles in directory
shapes = [os.path.join(path,shp) for shp in arcpy.ListFeatureClasses('*')]
outname = os.path.abspath(shapes[0]).split(os.sep)[-3]

#Merge shapefiles
arcpy.Merge_management(shapes, os.path.join(outpath, outname))

trees = os.path.join(outpath, outname)

#Make feature layers
arcpy.MakeFeatureLayer_management(trees, "tree_lyr")
arcpy.MakeFeatureLayer_management(parcels, "parcel_lyr")

#Select parcels that intersect trees
arcpy.SelectLayerByLocation_management("parcel_lyr", "INTERSECT", "tree_lyr",
                                   "#", "NEW_SELECTION")

#Create in memory feature class of selected parcels
arcpy.FeatureClassToFeatureClass_conversion("parcel_lyr", "in_memory", "treeparcels")
treeparcels = r'in_memory\treeparcels'

#Make feature layer of relevant parcels
arcpy.MakeFeatureLayer_management(treeparcels, "treepar_lyr")

#Create search cursor for relevant parcels
with arcpy.da.SearchCursor("treepar_lyr",
                       ["OID","LocName", "SITE_ADDR",
                        "CITY", "ZIP"])as pcur:
    for prow in pcur:
        #Select one parcel at a time
        arcpy.SelectLayerByAttribute_management("treepar_lyr", "NEW_SELECTION",
                                            "OID = {}".format(prow[0]))
        #Select trees that are within that parcel
        arcpy.SelectLayerByLocation_management("tree_lyr",
                                           "WITHIN", "treepar_lyr",
                                           "#", "NEW_SELECTION")
        #Create update cursor for trees
        with arcpy.da.UpdateCursor("tree_lyr",
                               ["Loc_Name", "SiteAddr",
                                "City_", "Zip_Code"]) as tcur:
            #Update tree attributes with attributes from selected parcel
            for trow in tcur:
                trow[0] = prow[1]
                trow[1] = prow[2]
                trow[2] = prow[3]
                trow[3] = prow[4]
                tcur.updateRow(trow)

Best Answer

I have found that a spatial join is always faster than a cursor.

Your real hangup will be working with the entire parcel feature class, however, when only a selection of those features are needed. To improve the efficiency of your process, I'd suggest pulling out only the parcel features you need for your analysis and creating a copy in memory. You want to process this once at the start of your analysis. It will still take some time. My suggested work flow:

  1. Merge all point shapefiles
  2. Create feature layer from parcel feature class
  3. Select parcels feature layer features that intersect merged points
  4. feature class to feature class performed on selected parcel feature layer-> Destination: "in_memory"

Your in_memory parcels should allow for much faster processing time, regardless of if you use a spatial join or a cursor.

The merging of hundreds of point shapefiles and the subsequent selection and export on a large parcel data set will be time intensive, but probably orders of magnitudes less than perpetually performing geoprocessing on a complete parcel data set.

Related Question