[GIS] Why is GDALPolygonize so much slower than ArcGIS Raster to Polygon

gdalpolygonizepython

I am attempting to polygonize a raster using GDALPolygonize() in a Python script. The script began polygonizing yesterday at 5pm and is still polygonizing now at 9:30 am. I have no clue how far along it is, but I know it is still going because when I refresh my Windows explorer I can see the file size change for the output shapefile.

My raster is rather large, but I still don't expect it to take this long. My raster is 35,486 Columns and 23,682 Rows with a 1 meter cell size. It is a binary raster where a value of 1 represents data and 0 is NoData.

When I polygonized in ArcGIS using Raster to Polygon in the Conversion Toolbox it took 56 seconds. The resulting shapefile is 200mb while the shapefile still being created by GDALPolygonize is still only 100mb. That makes me think GDAL is about half way done after running all night.

Specs:
Windows 7 64bit,
8gb RAM,
GDAL 1.10 64bit,
ArcGIS Desktop 10.2,
64bit Background Geoprocessing for ArcGIS Desktop,
Python 2.7.3 64bit

UPDATE
Day 2 – GDALPolygonize is still running. It has gone overnight 2 nights in a row and through a whole day without completing. ArcGIS took 56 seconds.

Best Answer

I have the same experience. The algorithm is really slow for huge rasters, although quite fast for smaller ones. There is one possible workaround:

  1. Split huge raster file into smaller files by gdalwarp (using -te to define extent for each file):

gdalwarp -te 12.08 48.5 12.5 51.1 original_file.tif part1.tif

  1. Polygonize each of them into separate shapefile:

gdal_polygonize.py part1.tif -f "ESRI Shapefile" part1.shp

  1. Merge shapefiles together:

ogr2ogr -f "ESRI Shapefile" -update -append merge.shp part1.shp -nln merge

  1. Dissolve the new shapefile:

ogr2ogr "output.shp" "input.shp" -dialect sqlite -sql "SELECT ST_Union(geometry), field FROM input GROUP BY field"

The final time was way faster.

Related Question