[GIS] Processing multiple files simultanously using python

arcpyparallel processingproximity

I have 11000 point shapefiles to run Near tool on against another point shapefile.
I am looking for a way to process 10 files at the same time.

Is there a way to process so many files using Python and arcpy in a resonable time considering that each file takes 5 min on average?

Best Answer

If you are restricted to windows you can write a DOS sript and run it in parallel as decribed parallel execution of shell processes

If can run in linux there is a procedure pretty well worked out using gnu-parallel. I did something similar for QGIS described in more detail at How to run processing commands in parallel in QGIS

Although the examples I give are for QGIS they would work in principle for arcpy in DOS too. ie If you can generate the the proces you neeed to carry out on one file as a BASH script which incorporates the file names as inputs, then you can use gnu parallel in linux or a parallel DOS script in DOS to handle the parallel work. Based on Oles answer to: parallelising-gis-operations-in-pyqgis eg:

Assuming you had a bash script which could run your "near" process as follows

my_standalone_script.py /path/to/a/point/shpfile.shp /path/to/another/point/file.shp /path/to/output/file.shp

You could install gnu parallel and run the script in parallel with:

find /path/to/point/shpfiles -type f -name *.shp | parallel my_standalone_script.py {} /path/to/another_point_file.shp /path/to-output/folder/neared_{/}.shp

As far as reasonable time is concerned gnu parallel will carry out the operation about as well as could be be achieved given any hardware limitations. You can even cluster more computers together and have them all carry out some of the solution each. (provided you can install the software required on each unit).

Related Question