ArcGIS – Resolving UnpickleableError in arcpy with Multiprocessing for Efficient Parallel Processing

arcgis-desktoparcpyparallel processingpython

I'm attempting to speed up a process which is currently running synchronously, by using the python multiprocessing module.

I'm having trouble sending a feature layer to a function which is called by multiprocessing, as demonstrated in this simple script:

import multiprocessing, arcpy

def doProcess(lyr):
    print(lyr.name)

if __name__ == '__main__':

    #Create an array of feature layers
    arcpy.env.workspace = "C:\Program Files (x86)\ArcGIS\Desktop10.2\TemplateData\TemplateData.gdb"
    featureLayers = []
    fcs = arcpy.ListFeatureClasses("*","All","World")
    for fc in fcs:
        arcpy.Delete_management(fc + "_lyr")
        lyrName = fc + "_lyr"
        arcpy.MakeFeatureLayer_management(fc, lyrName)
        featureLayers.append(arcpy.mapping.Layer(lyrName))

    #This works when not using multiprocessing:
    for featureLayer in featureLayers:
        doProcess(featureLayer)

    #This fails with "UnpickleableError: Cannot pickle <type 'geoprocessing Layer object'> objects"
    pool = multiprocessing.Pool()
    pool.map(doProcess, featureLayers)
    pool.close()
    pool.join()

When iterating over the array manually, rather than using multiprocessing, the function has access to the feature layer. But when using multiprocessing, this error message is shown:

UnpickleableError: Cannot pickle type 'geoprocessing Layer object'
objects

What is the correct syntax/approach to handle a feature layer within the multiprocessing environment? I based the above script on the example on the Esri blog Multiprocessing with ArcGIS

Best Answer

I finally found the time to look into this. I don't fully understand the "unpickleable" error message, but a workaround is to pass only strings into the multiprocessor. Something like this:

import multiprocessing, arcpy, os

def doProcess(fClass):
    #This function doesn't do anything, it's just to show that accessing arcpy methods is possible
    print("in do process function for " + fClass)
    arcpy.env.workspace = "C:\Program Files (x86)\ArcGIS\Desktop10.2\TemplateData\TemplateData.gdb"
    arcpy.Delete_management(fClass + "_lyr")
    lyrName = fClass + "_lyr"
    arcpy.MakeFeatureLayer_management(fClass, lyrName)
    desc = arcpy.Describe(lyrName)
    print("Finished " + desc.Name)

if __name__ == '__main__':

    #Create an array of feature class names
    arcpy.env.workspace = "C:\Program Files (x86)\ArcGIS\Desktop10.2\TemplateData\TemplateData.gdb"
    fClasses = []
    fcs = arcpy.ListFeatureClasses("*","All","World")
    for fc in fcs:
        fClasses.append(fc)

    #Multiprocessing approach
    pool = multiprocessing.Pool()
    pool.map(doProcess, fClasses)
    pool.close()
    pool.join()

(Interestingly, this script takes a lot longer to complete when I use the multiprocessing approach, compared to just running:

for fClass in fClasses:
    doProcess(fClass)

Presumably there's a lot more overhead in setting up the environments for each thread. Hopefully in a more complicated scenario involving long geoprocessing tasks, the payoff would be faster overall completion of all tasks.)

Related Question