QGIS – Running a Python Script to Extract Raster Data

qgis

I am trying to use QGIS to extract raster statistics for shapefiles (I have shapefiles of species distribution and I want to extract environmental data from within each species' range). This is possible using the Zonal Statistics plugin however I have about 300 shapefiles I need to get data from. I've tried merging the shapefiles into one but this results in massive files that cause QGIS to crash (using shapefiles >15mB causes problems).

Is there a way to automate the procedure using a Python script? I found a post that details how to run ZonalStats using Python (How to calculate raster statistics for polygons? second answer) but can't get it to work. I know absolutely nothing about Python and so I'm not sure if there's something that needs to be done before attempting to run the script or if the problem is something else. A similar script that will run on its own without user input would be ideal.

I'm running QGIS 1.8 on a Linux machine.

Any help would be much appreciated.

Best Answer

Yes you can do this in a stand alone script. To do this you efficiently, you will need SciPy. SciPy is a cousin of NumPy and together they extend Python with a lot of extremely powerful array handling capabilities. In SciPy you will use the 'ndimage' module.

Your procedure is to make a raster of your zones. Then import your original raster and the zones raster each into a NumPy array. Make a third numpy array of the unique values in your zones array to use as an index for the ndimage function. Finally use one of the ndimage statistics to extract your result, e.g.

myIndex = numpy.unique(zonesArray)    
minResult = ndimage.minimum(myRasterArray, labels=zonesArray, index=indexArray)

(obviously I have not included all the code to open a raster or vector file etc. but see below for help with that).

I often write my results to a a simple csv text file and can then do a table join to pull the statistics back into my zone polygons. If you have many rasters, then you can automate the process into a batch process.

While you can do this with GDAL and indeed, will need GDAL to read the original raster and polygon data, I would not recommend that you attempt it without SciPy because the process will have glacial alacrity! See this excellent series on coding GDAL with Python. Note tutorial 6 in the series as this is especially relevant. However, it doesn't mention SciPy as it is a bit old (and where it mentions 'Numeric' understand that this has been replaced by 'NumPy'). ndimage is very fast and much better than manually iterating over a raster in a bitwise fashion or even iterating over an array cell by cell. The speed difference of using SciPy vs. NumPy alone can be seconds compared to hours for the same data (from personal experience). As a bonus, it takes less code too :)

There is a lot packed into this message and it may appear daunting if you know nothing of Python. In which case, start with the tutorial series I mention above, which assumes zero coding experience. Then after the first couple of lessons, skip forward to the raster material. Once you know how to open a raster and get it into a Numpy (aka 'numeric') array, read the SciPy documentation on ndimage.

Related Question