python – How to Apply GDAL Polygonize for Raster to Vector

gdalpolygonizepython

I am desperately trying to vectorize raster files in a loop for which I came across GDAL's polygonize function (https://gdal.org/programs/gdal_polygonize.html).

However, I don't get the hang of it. To me, the input parameters are cryptic and I failed to figure out the exact commands required for application in Python.

** Edit**

With help of Ben's comment I got to this point:

from osgeo import gdal, ogr
import sys

# this allows GDAL to throw Python Exceptions
gdal.UseExceptions()

#  get raster datasource
src_ds = gdal.Open(inputFilename)
srcband = src_ds.GetRasterBand(1)

#  create output datasource
dst_layername = "POLYGONIZED_STUFF"
drv = ogr.GetDriverByName("ESRI Shapefile")
dst_ds = drv.CreateDataSource( outputPath+outputFile )
dst_layer = dst_ds.CreateLayer(dst_layername, srs = None )

gdal.Polygonize( srcband, None, dst_layer, -1, [], callback=None )

GDAL now creates three files:

  • filename.dbf (6,451 KB)
  • filename.shp (75,929 KB)
  • filename.shx (1 KB)

However, when I import the file with GeoPandas (test = gpd.read_file(outputPath+outputFile)), the GeoDataFrame is empty.

enter image description here

Bzw. I am working with this data set: http://www.earthstat.org/harvested-area-yield-175-crops/

Best Answer

I downloaded the dataset you linked to and tested with the following script- it works ok for me.

In the following script, I left my own paths in for reference but you should of course change them to match your own. I also created a spatial reference system for the output file, as well as a field to write the raster values to (I guess that would suit your goal).

from osgeo import gdal, ogr, osr

in_path = 'C:\\Users\\Ben\\Desktop\\GIS Files\\Zipped data\\HarvestedAreaYield175Crops_Geotiff\\HarvestedAreaYield175Crops_Geotiff\\GeoTiff\\oilpalm\\oilpalm_HarvestedAreaHectares.tif'

out_path = 'C:\\Users\\Ben\\Desktop\\GIS Files\\Zipped data\\HarvestedAreaYield175Crops_Geotiff\\HarvestedAreaYield175Crops_Geotiff\\GeoTiff\\oilpalm\\oilpalm_HarvestedAreaHectares.shp'

#  get raster datasource
src_ds = gdal.Open( in_path )
#
srcband = src_ds.GetRasterBand(1)
dst_layername = 'oilpalm_HarvestedAreaHectares'
drv = ogr.GetDriverByName("ESRI Shapefile")
dst_ds = drv.CreateDataSource( out_path )

sp_ref = osr.SpatialReference()
sp_ref.SetFromUserInput('EPSG:4326')

dst_layer = dst_ds.CreateLayer(dst_layername, srs = sp_ref )

fld = ogr.FieldDefn("HA", ogr.OFTInteger)
dst_layer.CreateField(fld)
dst_field = dst_layer.GetLayerDefn().GetFieldIndex("HA")

gdal.Polygonize( srcband, None, dst_layer, dst_field, [], callback=None )

del src_ds
del dst_ds

Afterwards, I loaded the created shapefile in QGIS and deleted all features with 0 value.

The loaded layer looks like this:

enter image description here

Here, you can see the polygonized layer with transparent fill & labelled with "HA" field value. The selected feature corresponds to the raster pixel with highest value (shown by the Identify tool).

enter image description here

References:

http://pcjericks.github.io/py-gdalogr-cookbook/raster_layers.html#polygonize-a-raster-band

Output field name when using gdal.Polygonize() in python

GDAL polygonize in python creating blank polygon?

Related Question