[GIS] How to improve performance of gdalwarp and gdal_translate pipeline

gdalgdal-translategdalwarp

tl;dr: I can't get gdal_translate to use multiple cores. How to fix?

I am using gdalwarp followed by gdal_translate to process a large GeoTIFF by first cropping to a polygon cutline and outputting a virtual raster, then translating the .vrt to a .tif. I have followed suggestions from a few different answers on this site, first I split up the processes into two to enable better compression following this answer about gdalwarp, then I attempted to speed up the performance of gdal_translate following this answer about multithread support for gdal_translate. I am running this on a remote server which has GDAL v2.2.2 installed and the OS is Ubuntu 16.04.6 LTS (Xenial Xerus).

This is my code.

gdalwarp -of vrt -crop_to_cutline \
  -cutline  ${path}/counties_chesapeake_watershed.gpkg ${path}/bigraster.tif ${path}/clippedraster.vrt
gdal_translate -co compress=LZW -co NUM_THREADS=8 --config GDAL_CACHEMAX 512 \
  ${path}/clippedraster.vrt ${path}/clippedraster.tif

My issue is that I don't believe that gdal_translate is using multiple cores, though I've tried to specify this with NUM_THREADS and also to increase GDAL_CACHEMAX. This is a very large raster (~12GB, several hundred km extent at 1 m resolution) so it is running extremely slowly. Can anyone help me parallelize the compression done by gdal_translate so this will run faster?

Best Answer

You're getting a speedup using NUM_THREADS, but only at the compression stage. gdal_translate cannot used multithreading for any function apart from compression.

Probably the GDAL_CACHEMAX command is helping you out more than the NUM_THREADS option.