arcpy – Solving Multiprocessing Errors in ArcGIS Implementation

arcgis-10.0arcpyparallel processing

I was wondering if anyone else in the community here has attempted to use multi-processing for spatial analyses. Namely I am trying to iterate through a series of rasters, create a multiprocessing job for each and run them through a number of geoprocessing steps within one def function. Something along the lines of

def net(RasterImage, OutFolderDir):
    arcpy.env.overwriteOutput = True  
    arcpy.env.workspace = OutFolderDir 
    DEM_Prj = DEM_Prj.tif

    try:
        arcpy.ProjectRaster_management(RasterImage, DEM_Prj....
        FocalStatistics(DEM_prj....)
        ...

if __name__ == '__main__':  
    InputFolder = r'C:\test\somepath'  
    Output = r'C:\test\somepath2'  
    arcpy.env.workspace = InputFolder  
    arcpy.env.scratchWorkspace = r'C:\test.gdb'    

    fcs = arcpy.ListRasters('*')
    pool = multiprocessing.Pool(4)   
    jobs = []                 
    for fc in fcs:
        rIn = os.path.join(InputFolder,fc)
        rOut = os.path.join(Output,fc[:-4])
        jobs.append(pool.apply_async(net,(rIn, rOut)))    

Now the multiprocessing does run, usually for the first batch! However, I keep running into several different errors when attempting several datasets(more than 4 files – i.e. 4 core multiprocessing) including:

ERROR 010302: Unable to create the output raster: C:\somepath\sr6f8~1\FocalSt_srtm1
ERROR 010067: Error in executing grid expression.
Failed to execute (FocalStatistics).

and

ERROR 999999: Error executing function.
Failed to copy raster dataset
Failed to execute (ProjectRaster)

Notice in the first error the strange folder that is created (in the OutFolderDir location) associated with the focal statistics that nearly creates an exact replica of the final output.

My question is based off your experience is it impossible to create several step geoprocessing within one multiprocessing function? Or do I need to tile these steps into their individual geoprocessing steps?

UPDATE

Still encoutering similar errors – moving the import functions to the def function has shown that

import arcpy 
from arcpy.sa import *

cannot create an output with an added syntax warning that of import * is not allowed.

UPDATE #2

I know this is a late reply but I thought it might benefit someone else for future reference to my workaround that allows multiprocessing to work with arcpy. The main problem I found after returning to this problem is not the competition of the arcpy modules but rather competition over the scratchWorkspace that the ArcObjects utilize to save the temporary files. Therefore consider running a counter into the multiprocessing parsing argument to make a unique scratchWorkspace for each process i.e.

Counter = 0 
for fc in fcs:              
    rIn = os.path.join(InputFolder,fc)              
    rOut = os.path.join(Output,fc[:-4])                    
    jobs.append(pool.apply_async(net,(rIn, rOut,Counter)))            
    Counter += 1

Then in the main function make a specific temporary directory and assign an unique scratchWorkspace to each multiprocessing task.

def main(RasterImage,OutFolderDir,Counter)      
    TempFolder = os.path.join(os.path.dirname(OutFolderDir),'Temp_%s'%  (Counter))      
    os.mkdir(TempFolder)      
    arcpy.scratchWorkspace = TempFolder      
    ... 

Hope that helps and thanks to Ragi for the inital suggestion to use separate temp workspaces – still baffled by why it originally did not work.

Additional Resources

ESRI Multiprocessing Blog

Python,Gis and Stuff Blog

Best Answer

Each IWorkspace connection (i.e each database connection) has thread affinity. Two threads cannot share the same workspace. You can have one thread own the resource and then sync the access, but if you are going to be using straight gp functions, then that is even not an option.

The easiest (lame) way is to create separate processes and then do multi process synchronization (as opposed to multithread synchronization). Even then you should be aware of the underlying workspace type. if you are not using arcsde (a multi-user datasource) you will probably use a single user datasource (like personal or filegdb). Then remember that means only one process can write at a time! The typical (lame) synchronization for these scenarios is that each parallel process writes to a different temp workspace and then you merge it all in to your destination workspace in a single process.

Related Question