If you change your program to read the file name from the command line and split up your input file in smaller chunks, you can do something like this using GNU Parallel:
parallel my_processing.py {} /path/to/polygon_file.shp ::: input_files*.shp
This will run 1 job per core.
All new computers have multiple cores, but most programs are serial in nature and will therefore not use the multiple cores. However, many tasks are extremely parallelizeable:
- Run the same program on many files
- Run the same program for every line in a file
- Run the same program for every block in a file
GNU Parallel is a general parallelizer and makes is easy to run jobs in parallel on the same machine or on multiple machines you have ssh access to.
If you have 32 different jobs you want to run on 4 CPUs, a straight forward way to parallelize is to run 8 jobs on each CPU:
GNU Parallel instead spawns a new process when one finishes - keeping the CPUs active and thus saving time:
Installation
If GNU Parallel is not packaged for your distribution, you can do a personal installation, which does not require root access. It can be done in 10 seconds by doing this:
(wget -O - pi.dk/3 || curl pi.dk/3/ || fetch -o - http://pi.dk/3) | bash
For other installation options see http://git.savannah.gnu.org/cgit/parallel.git/tree/README
Learn more
See more examples: http://www.gnu.org/software/parallel/man.html
Watch the intro videos: https://www.youtube.com/playlist?list=PL284C9FF2488BC6D1
Walk through the tutorial: http://www.gnu.org/software/parallel/parallel_tutorial.html
Sign up for the email list to get support: https://lists.gnu.org/mailman/listinfo/parallel
Building on recipes from the two answers here and here, you can try the script below. You will need to edit at least a couple of lines. Depending on how you installed QGIS (whether you used the standalone installer or OsGeo4W installer) your QgsApplication
prefix path and the path to the processing plugin may be different from what I used in the script.
Also, you will need to edit the in_folder
and out_folder
paths to match your own file system.
import sys
import os
from qgis.core import (QgsApplication, QgsVectorLayer)
from qgis.analysis import QgsNativeAlgorithms
# See https://gis.stackexchange.com/a/155852/4972 for details about the prefix
QgsApplication.setPrefixPath('C:/OSGeo4W/apps/qgis', True)
qgs = QgsApplication([], False)
qgs.initQgis()
# Append the path where processing plugin can be found
sys.path.append('C:\\OSGeo4W\\apps\\qgis\\python\\plugins')
import processing
from processing.core.Processing import Processing
Processing.initialize()
QgsApplication.processingRegistry().addProvider(QgsNativeAlgorithms())
in_folder = 'C:\\Users\\Ben\\Desktop\\TEMP\\txt_files'
out_folder = 'C:\\Users\\Ben\\Desktop\\TEMP\\kml_files'
def save_as_kml(in_file, save_location):
out_layer = os.path.join(save_location, f'{in_file.name.replace(".txt", ".kml")}')
uri = 'file:///{}?delimiter={}&crs=epsg:4326&xField={}&yField={}'.format(in_file.path, '\\t','X', 'Y')
vlayer = QgsVectorLayer(uri, os.path.basename(in_file), 'delimitedtext')
paths = processing.run("qgis:pointstopath",
{'INPUT':vlayer,
'CLOSE_PATH':True,
'ORDER_FIELD':'Sort',
'GROUP_FIELD':'',
'DATE_FORMAT':'',
'OUTPUT':'TEMPORARY_OUTPUT'})
processing.run("native:polygonize",
{'INPUT':paths['OUTPUT'],
'KEEP_FIELDS':False,
'OUTPUT':out_layer})
src_dir = os.scandir(in_folder)
for file in src_dir:
if file.name.endswith('.txt'):
save_as_kml(file, out_folder)
I'm not sure how you run your standalone scripts, but if you have trouble, here is how I do it:
Save the script above as a .py file (e.g. save_as_kml.py)
Next, create a batch file with the following content:
@echo off
SET OSGEO4W_ROOT=C:\OSGeo4W
call "%OSGEO4W_ROOT%"\bin\o4w_env.bat
@echo off
path %PATH%;%OSGEO4W_ROOT%\apps\qgis\bin
path %PATH%;C:\OSGeo4W\apps\Qt5\bin
path %PATH%;C:\OSGeo4W\apps\Python39\Scripts
set QGIS_PREFIX_PATH=%OSGEO4W_ROOT:\=/%/apps/qgis
set GDAL_FILENAME_IS_UTF8=YES
rem Set VSI cache to be used as buffer, see #6448
set VSI_CACHE=TRUE
set VSI_CACHE_SIZE=1000000
set PYTHONPATH=%PYTHONPATH%;%OSGEO4W_ROOT%\apps\qgis\python
set PYTHONHOME=%OSGEO4W_ROOT%\apps\Python39
set QT_PLUGIN_PATH=%OSGEO4W_ROOT%\apps\qgis\qtplugins;%OSGEO4W_ROOT%\apps\qt5\plugins
cmd.exe
Again, the OSGEO4W_ROOT may be different than mine depending on your installation, just make sure it points to your main installation directory which contains the bin folder.
Save the batch file in the same location as your .py file. Then you can just double click the batch to launch it and, at the prompt, type: python save_as_kml.py
to run your script.
Best Answer
You could use something like the following in your script which finds all shapefiles in a selected folder and for each shapefile, applies a processing function:
EDIT:
In response to the comments, you can process and output individual shapefiles by using similar code below (I used the buffer algorithm as an example):