[GIS] Selecting files with same first 5 characters in filename for merging

arcpystring

I have a number of polygon feature classes in a geodatabase dataset. The file names are as below:

e.g. IHO1a_xxx-xxxx_ALB,IHO1b_xxx-xxxx_MBES_10m, IHO1b_xxx-xxxx_MBES_2m

I would like to get a python script which will select all files starting with exactly the same first 14 characters in the filename and merge them.

import os,arcpy

# Define location of .gdb
arcpy.env.workspace = "D:\Users\Documents\H-11_Documents for   Reports_Paul\Test_Model_Builder\ICP_Intermediate.gdb\Deliveryxx_Clipped_Po        lygons" 

#Make workspace a variable
workspace = arcpy.env.workspace

List = []

for dirpath, dirnames, filenames in arcpy.da.Walk(workspace, datatype="FeatureClass", type = "Polygon"):
    for filename in filenames:
        if filename.startswith(filename[0:13]):
            List.append(os.path.join(dirpath, filename))
print List
arcpy.Merge_management([filename,filename], 'D:\Users\cadetpn\Documents\H-11_Documents for Reports_Paul\Test_Model_Builder\Temp.gdb\Temp_Data\merge%n%')

##To keep console window open
raw_input("Press enter to exit...")

Best Answer

Python's itertools.groupby is ideally suited for this type of task--in fact it is a one-liner to group by a search string. For example:

import itertools

test = ['IHO1a_xxx-abcd_ALB',
'IHO1a_xxx-abcd_ALB',
'IHO1a_xxx-abcd_ALB',
'IHO1b_xxx-1234_ALB',
'IHO1c_xxx-dcba_ALB',
'IHO1b_xxx-aaaa_ALB',
'IHO1b_xxx-1234_ALB']

groups =  [list(g) for _, g in itertools.groupby(sorted(test), lambda x: x[0:5])]

>>> groups
[['IHO1a_xxx-abcd_ALB', 'IHO1a_xxx-abcd_ALB', 'IHO1a_xxx-abcd_ALB'],
 ['IHO1b_xxx-1234_ALB', 'IHO1b_xxx-1234_ALB', 'IHO1b_xxx-aaaa_ALB'],
 ['IHO1c_xxx-dcba_ALB']]

Then simply iterate over your groups and perform the merge.

Related Question