[GIS] Extract raster values within shapefile with pygeoprocessing or gdal

gdalpygeoprocessingpython

I would like to know how to get all the raster values within a polygon using gdal or pygeoprocessing, without reading the entire grid as an array.

pygeoprocessing and gdal can do zonal statistics but only the min, max, mean, stdev or count are available from such a function. Since zonal statistics need to access the values, would it be easy to extract values the same way ?

I found a very similar question here :
(Getting pixel value of GDAL raster under OGR point without NumPy?)
but only for a particular "point".

Best Answer

You can use rasterio to extract the raster values within a polygon as in GIS SE: GDAL python cut geotiff image with geojson file

I use here a one band raster file and GeoPandas for the shapefile ( instead of Fiona)

import rasterio
from rasterio.mask import mask
import geopandas as gpd
shapefile = gpd.read_file("extraction.shp")
# extract the geometry in GeoJSON format
geoms = shapefile.geometry.values # list of shapely geometries
geometry = geoms[0] # shapely geometry
# transform to GeJSON format
from shapely.geometry import mapping
geoms = [mapping(geoms[0])]
# extract the raster values values within the polygon 
with rasterio.open("raster.tif") as src:
     out_image, out_transform = mask(src, geoms, crop=True)

The out_image result is a Numpy masked array

# no data values of the original raster
no_data=src.nodata
print no_data
-9999.0
# extract the values of the masked array
data = out_image.data[0]
# extract the row, columns of the valid values
import numpy as np
row, col = np.where(data != no_data) 
elev = np.extract(data != no_data, data)

Now I use How to I get the coordinates of a cell in a geotif? or Python affine transforms to transform between the pixel and projected coordinates with out_transform as the affine transform for the subset data

 from rasterio import Affine # or from affine import Affine
 T1 = out_transform * Affine.translation(0.5, 0.5) # reference the pixel centre
 rc2xy = lambda r, c: (c, r) * T1

Creation of a new resulting GeoDataFrame with the col, row and elevation values

d = gpd.GeoDataFrame({'col':col,'row':row,'elev':elev})
# coordinate transformation
d['x'] = d.apply(lambda row: rc2xy(row.row,row.col)[0], axis=1)
d['y'] = d.apply(lambda row: rc2xy(row.row,row.col)[1], axis=1)
# geometry
from shapely.geometry import Point
d['geometry'] =d.apply(lambda row: Point(row['x'], row['y']), axis=1)
# first 2 points
d.head(2)
     row  col   elev       x          y            geometry  
 0    1    2  201.7!  203590.58  89773.50  POINT (203590.58 89773.50)  
 1    1    3  200.17  203625.97  89773.50  POINT (203625.97 89773.50)

# save to a shapefile
d.to_file('result.shp', driver='ESRI Shapefile')

Related Solutions

[GIS] Getting pixel value of GDAL raster under OGR point without NumPy

You can use the gdal.Dataset or gdal.Band ReadRaster method. See the GDAL and OGR API tutorials and the example below. ReadRaster does not use/require numpy, the return value is raw binary data and needs to be unpacked using the standard python struct module.

An example:

from math import floor
from osgeo import gdal,ogr
import struct

src_filename = '/tmp/test.tif'
shp_filename = '/tmp/test.shp'

src_ds=gdal.Open(src_filename) 
gt_forward=src_ds.GetGeoTransform()
gt_reverse=gdal.InvGeoTransform(gt_forward)
rb=src_ds.GetRasterBand(1)

ds=ogr.Open(shp_filename)
lyr=ds.GetLayer()
for feat in lyr:
    geom = feat.GetGeometryRef()
    mx,my=geom.GetX(), geom.GetY()  #coord in map units

    #Convert from map to pixel coordinates.
    px, py = gdal.ApplyGeoTransform(gt_reverse, mx, my)
    px = floor(px) #x pixel
    py = floor(py) #y pixel

    structval=rb.ReadRaster(px,py,1,1,buf_type=gdal.GDT_UInt16) #Assumes 16 bit int aka 'short'
    intval = struct.unpack('h' , structval) #use the 'short' format code (2 bytes) not int (4 bytes)
    
    print intval[0] #intval is a tuple, length=1 as we only asked for 1 pixel value

Alternatively, since the reason you gave for not using numpy was to avoid reading the entire array in using ReadAsArray(), below is an example that uses numpy and does not read the entire raster in. It uses the built-in gdal.ApplyGeoTransform() function in order to deal with axes rotations.

from math import floor
from osgeo import gdal,ogr

src_filename = '/tmp/test.tif'
shp_filename = '/tmp/test.shp'

src_ds=gdal.Open(src_filename) 
gt_forward=src_ds.GetGeoTransform()
gt_reverse=gdal.InvGeoTransform(gt_forward)
rb=src_ds.GetRasterBand(1)

ds=ogr.Open(shp_filename)
lyr=ds.GetLayer()
for feat in lyr:
    geom = feat.GetGeometryRef()
    mx,my=geom.GetX(), geom.GetY()  #coord in map units

    #Convert from map to pixel coordinates.
    px, py = gdal.ApplyGeoTransform(gt_reverse, mx, my)
    px = floor(px) #x pixel
    py = floor(py) #y pixel

    intval=rb.ReadAsArray(px,py,1,1)
    print intval[0] #intval is a numpy array, length=1 as we only asked for 1 pixel value

[GIS] GDAL/OGR to extract pixel values

Your question defines a complex process which is difficult to help with all of it. Here, I will suggest something that might help with the part getting the raster data using the vector.

Refer to this: http://machinalis8.rssing.com/chan-51920729/latest.php#item3

In that blog post I rasterize vector data to subset pixels from an image.

The idea there is that you convert the vector data into a raster, like a mask. Then, using numpy, you use that mask to filter or choose the pixels in the raster.

You may need to check the part about "the center of the pixel" fallin within the poligon: it will depend on GDAL's implementation of RasterizeLayer.

Next, you can do the rest of the calculations.

Best Answer

Related Solutions

[GIS] Getting pixel value of GDAL raster under OGR point without NumPy

[GIS] GDAL/OGR to extract pixel values

Related Question