[GIS] Using Python to iterate through folders and join like shapefiles, based on name

arcpyiterator

I'm trying to automated a process of searching through folders and merging shapefiles of the same "type" (Roads, Water, Buildings)

On the surface, what I want to do seems pretty simple. I have several data sets in individual folders, each folder containing point, poly and line data for my study area.

The files in each folder are listed by map sheet, then description of the shape, then shape type. For example:

enter image description here

Each file folder contains a similar structure, only changing the map sheet name, but keeping the rest of the description. I want to join the files together based on the last 7 characters that make up their names. For example, each of the folders I am working with contain a building point file. (BL_POINT)

I've set up my script so it searches through all of my folders and lists the names of the files.

import arcpy, sys, os
from arcpy import env

arcpy.env.workspace = sys.path[0]
workspace = env.workspace

feature_class = []

walk = arcpy.da.Walk(workspace, datatype="FeatureClass", type="All")

for dirpath, dirnames, filenames in walk:
    for filename in filenames:
        feature_class += filename + ";"
        print ("Found Feature Class: " + filename)

Are there any suggestions how I can automate the process of picking out files with identical final 7 characters once I add all my files to the feature_class list?

Ideally, I would like to have it so that it is automated, and does not require me to input the final 7 characters for each type of feature (Roads, Buildings, Water, etc)

Or, if I am going about this backwards, any suggestion of a better process would be swell.

Again, all I want to do is:

  1. Search through folders
  2. Merge files that are similar, based on their names.

Best Answer

The script below will search through all sub-folders from a given root and append the data to newly created shapefiles. Code is commented (you can comment out the print statements). The gist of the workflow is to first get a list of all the ending strings of shapefiles after the first underscore. Then create a unique list based on this list. Use the unique list to create empty shapefiles and append in all shapefiles that end with this unique value.

import arcpy, os
from arcpy import env

env.workspace = r"C:\output"
root_folder = r"C:\test"

# this list will hold the spliced endings of the input shapefiles
ending_list = []

# traverse through all the files and append the endings of the shapefiles to
# the ending_list. Everything after the first underscore will be appended
# for example BL_POINT.shp, BL_POLY.shp, LF_LINE.shp etc
for root, dirs, files in os.walk(root_folder):
    for filename in files:
        if filename.endswith(".shp"):
            underscore_index = [pos for pos, char in enumerate(filename) if char == "_"]
            ending_list.append(filename[underscore_index[0]+1:])

# get a unique set of endings
unique_ending_list = list(set(ending_list))

# create an empty feature class for each unique ending with same schema as
# the shapefile for _POINT, _LINE etc
# and Append in all the shapefiles that match the ending
for unique in unique_ending_list:
    for root, dirs, files in os.walk(root_folder):
        for filename in files:
            if filename.endswith(unique):
                filepath = root + "\\" + filename
                if arcpy.Exists(unique):
                    print "Appending: " + filepath
                    print "\tto " + unique
                    arcpy.Append_management(filepath, unique)
                else:
                    sr = arcpy.Describe(filepath).spatialReference
                    arcpy.CreateFeatureclass_management(env.workspace, unique, "", filepath, "", "", sr)
                    print "Appending: " + filepath
                    print "\tto " + unique
                    arcpy.Append_management(filepath, unique)
Related Question