I'm attempting to speed up a process which is currently running synchronously, by using the python multiprocessing module.
I'm having trouble sending a feature layer to a function which is called by multiprocessing, as demonstrated in this simple script:
import multiprocessing, arcpy
def doProcess(lyr):
print(lyr.name)
if __name__ == '__main__':
#Create an array of feature layers
arcpy.env.workspace = "C:\Program Files (x86)\ArcGIS\Desktop10.2\TemplateData\TemplateData.gdb"
featureLayers = []
fcs = arcpy.ListFeatureClasses("*","All","World")
for fc in fcs:
arcpy.Delete_management(fc + "_lyr")
lyrName = fc + "_lyr"
arcpy.MakeFeatureLayer_management(fc, lyrName)
featureLayers.append(arcpy.mapping.Layer(lyrName))
#This works when not using multiprocessing:
for featureLayer in featureLayers:
doProcess(featureLayer)
#This fails with "UnpickleableError: Cannot pickle <type 'geoprocessing Layer object'> objects"
pool = multiprocessing.Pool()
pool.map(doProcess, featureLayers)
pool.close()
pool.join()
When iterating over the array manually, rather than using multiprocessing, the function has access to the feature layer. But when using multiprocessing, this error message is shown:
UnpickleableError: Cannot pickle type 'geoprocessing Layer object'
objects
What is the correct syntax/approach to handle a feature layer within the multiprocessing environment? I based the above script on the example on the Esri blog Multiprocessing with ArcGIS
Best Answer
I finally found the time to look into this. I don't fully understand the "unpickleable" error message, but a workaround is to pass only strings into the multiprocessor. Something like this:
(Interestingly, this script takes a lot longer to complete when I use the multiprocessing approach, compared to just running:
Presumably there's a lot more overhead in setting up the environments for each thread. Hopefully in a more complicated scenario involving long geoprocessing tasks, the payoff would be faster overall completion of all tasks.)