[GIS] Processing multiple files simultanously using python

arcpyparallel processingproximity

I have 11000 point shapefiles to run Near tool on against another point shapefile.
I am looking for a way to process 10 files at the same time.

Is there a way to process so many files using Python and arcpy in a resonable time considering that each file takes 5 min on average?

Best Answer

If you are restricted to windows you can write a DOS sript and run it in parallel as decribed parallel execution of shell processes

If can run in linux there is a procedure pretty well worked out using gnu-parallel. I did something similar for QGIS described in more detail at How to run processing commands in parallel in QGIS

Although the examples I give are for QGIS they would work in principle for arcpy in DOS too. ie If you can generate the the proces you neeed to carry out on one file as a BASH script which incorporates the file names as inputs, then you can use gnu parallel in linux or a parallel DOS script in DOS to handle the parallel work. Based on Oles answer to: parallelising-gis-operations-in-pyqgis eg:

Assuming you had a bash script which could run your "near" process as follows

my_standalone_script.py /path/to/a/point/shpfile.shp /path/to/another/point/file.shp /path/to/output/file.shp

You could install gnu parallel and run the script in parallel with:

find /path/to/point/shpfiles -type f -name *.shp | parallel my_standalone_script.py {} /path/to/another_point_file.shp /path/to-output/folder/neared_{/}.shp

As far as reasonable time is concerned gnu parallel will carry out the operation about as well as could be be achieved given any hardware limitations. You can even cluster more computers together and have them all carry out some of the solution each. (provided you can install the software required on each unit).

Related Solutions

[GIS] How to write output from subprocess to disk (using multiprocessing)

You're probably getting some sort of exception being raised. Perhaps use a Queue to pass messages back to the parent process.

Tested working code:

import os, arcpy, arcgisscripting, time, sys
from multiprocessing import Process
from multiprocessing.queues import SimpleQueue

def ConvertCADtoGDB(msgs,in_dgn,out_gdb):
    try:
        gp = arcgisscripting.create()
        gp.ImportCAD_conversion(in_dgn,out_gdb,'','Explode_Complex')
        if not arcpy.Exists(out_gdb):raise RuntimeError('%s does not exist'%out_gdb)
        msgs.put(gp.GetMessages())
    except Exception as err:
        msgs.put(gp.GetMessages())
        msgs.put(err)

def main(srch_dir,gdb_dir,timeout):
    for dirpath, dirnames, files in os.walk(srch_dir):
        for f in files:
            f_name, f_ext = os.path.splitext(f)
            in_dgn = os.path.join(dirpath,f)
            out_gdb = os.path.join(gdb_dir,f_name+'.gdb')
            if f_ext == '.dgn':
                if not os.path.isdir(out_gdb):
                    print 'creating',out_gdb
                    #####################################################
                    # Create a separate process to run the tool in
                    # and a Queue to pass messages back
                    m = SimpleQueue()
                    p = Process(target=ConvertCADtoGDB,
                                       args=(m,in_dgn,out_gdb))
                    p.start()
                    p.join(timeout)   # Wait for process to complete
                    err = None
                    while not m.empty(): # Check if messages in queue
                        msg = m.get()
                        if isinstance(msg, Exception):err= msg
                        else: print msg
                    if p.is_alive():  # Terminate process if it is
                        p.terminate() # still running after the timeout
                        print('terminated')
                    elif err:
                        print('unsuccessful: %s'%err)
                    else:
                        print('successful')
                    #####################################################

if __name__ == '__main__':
    srch_dir=r'C:\Temp\dgns'
    gdb_dir=r'C:\Temp\gdbs'
    timeout=60 #Seconds
    main(srch_dir,gdb_dir,timeout)

The output is:

creating C:\Temp\gdbs\smalltest.gdb
Executing: ImportCAD C:\Temp\dgns\smalltest.dgn C:\Temp\gdbs\smalltest.gdb # Explode_Complex
Start Time: Thu Jul 24 09:37:42 2014
...Importing layers from C:\Temp\dgns\smalltest.dgn
...Importing entities from C:\Temp\dgns\smalltest.dgn
......4 entities imported from C:\Temp\dgns\smalltest.dgn
...Importing and consolidating extended data to separate table
Succeeded at Thu Jul 24 09:37:46 2014 (Elapsed Time: 4.00 seconds)
successful

[GIS] Batch processing near feature distance in ArcGIS Desktop

Python is all about combining many operations into one. In the script below, I iterate through feature classes in a workspace. For each, I iterate through a list of other feature classes to perform a near analysis on. I perform the near analysis, and with a little help of a dictionary as well as field calculate, I transfer the results into new fields. Finally, after performing the multiple near analyses I copy the feature class with feature class to feature class.

Try something like this:

import arcpy # Import arcpy module
import os

#features to which distance will be calculated
RoadnearFeature_shp = r'C:\GIS route network\road.shp'

TrainnearFeature_shp = r'C:\GIS route network\train.shp'

restnearFeature_shp = r'C:\GIS route network\restaurant.shp'

outLocation = r"C:\GISStuff"

#Dictionary for field name assignment
di = {}
di [RoadnearFeature_shp] = "ROAD"
di [TrainnearFeature_shp] = "TRAIN"
di [restnearFeature_shp] = "REST"

addedfields = []

# path where all my point shp files are kept
arcpy.env.workspace = r'C:\ArcGIS sample data'

# looping through all the files
for file in arcpy.ListFeatureClasses ():
    for NearFC in [RoadnearFeature_shp, TrainnearFeature_shp, restnearFeature_shp]:
        #Perform analysis
        arcpy.Near_analysis(file, NearFC, "30 Meters", "LOCATION", "ANGLE")

        #Add fields to store near analysis results
        arcpy.AddField_management (file, di[NearFC] + "_ID", "LONG")
        arcpy.AddField_management (file, di[NearFC] + "_DIST", "DOUBLE")

        #Calculate fields
        arcpy.CalculateField_management (file, di[NearFC] + "_ID", "!NEAR_FID!", "PYTHON_9.3")
        arcpy.CalculateField_management (file, di[NearFC] + "_DIST", "!NEAR_DIST!", "PYTHON_9.3")

        #Delete fields from analysis
        arcpy.DeleteField_management (NearFC, "NEAR_FID")
        arcpy.DeleteField_management (NearFC, "NEAR_DIST")

        #Track fields that have been added
        addedfields.append (di[NearFC] + "_ID")
        addedfields.append (di[NearFC] + "_DIST")

    #Analysis done. Copy feature class
    filepath = os.path.join (r'C:\ArcGIS sample data', file)
    NewName = file[:-4] + "_Near.shp"
    arcpy.FeatureClassToFeatureClass_conversion (filepath, outLocation, NewName)

    #Optional
    #Delete analysis fields from original feature class
    for field in addedfields:
        arcpy.DeleteField_management (filepath, field)

Best Answer

Related Solutions

[GIS] How to write output from subprocess to disk (using multiprocessing)

[GIS] Batch processing near feature distance in ArcGIS Desktop

Related Question