Where can we find better documention for python QGIS multiprocessing.
Especially regarding the multiprocessing of python and QGIS.
[GIS] Python and QGIS multiprocessing documentation
documentationparallel processingpythonqgis
Related Solutions
I was wrong in my comment to original post. Suggested workaround does work. In case someone else has that issue here is what needs to be done in QGIS 2.0 before instantiating Manager().
# OSGeo4W does not bundle python in exec_prefix for python
path = os.path.abspath(os.path.join(sys.exec_prefix, '../../bin/pythonw.exe'))
mp.set_executable(path)
sys.argv = [ None ]
Note that this cannot be tested from python console as Windows lacks fork() and all multiprocessing statements shall be isolated.
If one wants to play around with multiprocessing and embedded python from OSGeo4W bundle outside of QGIS here is the code.
tst.py
import multiprocessing as mp
import sys, os
print("Non-isolated statement")
if __name__ == '__main__':
print("I'm in main module")
path = os.path.abspath(os.path.join(sys.exec_prefix, '../../bin/pythonw.exe'))
mp.set_executable(path)
print("Setting executable path to {:s}".format(path))
sys.argv = [ None ] # '../tst.py' __file__
mgr = mp.Manager()
print("I'm past Manager()")
tst.c
#include <Python.h>
#include <stdio.h>
int main(int argc, char * argv[]) {
char buf[10240] = {0};
size_t sz, res;
FILE *f;
Py_SetProgramName(argv[0]); /* optional but recommended */
Py_Initialize();
f = fopen("../tst.py", "r");
// obtain file size:
fseek(f, 0 , SEEK_END);
sz = ftell(f);
rewind(f);
res = fread(buf, 1, sz, f);
fclose(f);
PyRun_SimpleString(buf);
getchar();
/* PyRun_SimpleString("from time import time,ctime\n" */
/* "print 'Today is',ctime(time())\n"); */
Py_Finalize();
return 0;
}
CMakeLists.txt
cmake_minimum_required(VERSION 2.8)
find_package(PythonLibs)
include_directories(${PYTHON_INCLUDE_DIRS})
add_executable(tst tst.c)
target_link_libraries(tst ${PYTHON_LIBRARY})
if(MSVC)
set_target_properties(tst PROPERTIES LINK_FLAGS -NODEFAULTLIB:python27_d)
endif()
- From your OSGeo4W console, initialize your
vcvarsall.bat
building environment - Create subfolder
build
(or alike) and cd to it - Use
cmake-gui ..
to generate jom/nmake makefiles inbuild
folder provided all files are saved in the parent one - Use nmake or jom to build tst.exe
- Try to run
tst
orpython ..\tst.py
You can use dblink in a native Postgres query to split the query up into separate database connections and execute them simultaneously. This is effectively parallelism in Postgres on a single server. It could be mimicked in Python, but I haven't tried it.
There are some limitations: 1) the operation needs to be an insert, not an update. Inserts are generally faster anyway as you're not altering an existing table (depends on your HDD as far as I understand); 2) you'll need an integer ID field to be able to split the query into chunks. Adding a serial field is best as it creates a sequential integer which breaks the work up as evenly as possible.
See Mike Gleason's parallel processing function for the details.
Key performance tip: use the boundary table as the table to split, not the points.
Using this method, we can boundary tag ~10 million points in ~15,000 polygons in about a minute on a 16 core Windows 2012 Server with 128Gb RAM on an SSD. It could run faster in Linux, but I haven't tested it.
Best Answer
Just to add to @PolyGeo's answer that I also could not find any official documentation regarding multiprocessing (if it even exists!). But there is also another method, described in this blog, which uses multithreading in QGIS which might be useful.
Main difference between the two methods are (more of which is discussed here):
Multiprocessing allows multiple processors to simultaneously run separate sets of instructions (threads). A main advantage of this method is that if an error occurs in one process, it will not have an effect on the other processes.
Multithreading allows for specific operations within a single application to be subdivided further into individual threads. The main advantage of this method is that each of these threads can be run in parallel but due care must be taken as if an error occurs in a single thread, the whole operation could crash.