GDALWarp – Convert GeoPDF to 8-bit GeoTIFF

gdalwarp

I am using gdalwarp to convert GeoPDFs of processed maps to GeoTIFFs (for later stitching into a larger GeoTIFF) using the following command:

gdalwarp -t_srs EPSG:28356 -r cubic -cutline "nsw_map_boundaries\20160506_nsw_map_bounds.geojson" -cwhere "name = '9030-4S SPRINGWOOD'" -crop_to_cutline -dstalpha "9030-4S SPRINGWOOD.pdf" "9030-4S SPRINGWOOD.tif"

The GeoPDFs have collars, so the cutline file contains the boundaries of the actual maps. EPSG:28356 is the projection of the map (GDA94 / MGA Zone 56).

enter image description here

Unfortunately this approach turns a 10MB PDF into a 70MB GeoTiff! The warping also re-orients the map to align with the UTM grid.

enter image description here

The main reason for the size is that the output GeoTIFFs are in 32-bit format. The original PDF files only have around 30 distinct colours (see below), so it would be more efficient if the GeoTIFFs were in 8-bit paletted colour. I haven't been able to find a flag or setting to do this.

enter image description here

Is there a way of achieving this – either with gdalwarp, or other GDAL tools (or both)?

One constraint is that the GeoTIFFs do need to have transparency – either via alpha, or via a NoData value – for anything outside the cutline. The current gdalwarp command uses alpha (-dstalpha flag), but only because I couldn't easily get a NoData value to work.

Sample PDF file available from the NSW Topo Map Portal: https://portal.spatial.nsw.gov.au/download/NSWTopographicMaps/DTDB_GeoReferenced_Raster_CollarOn_161070/2017/25k/9030-4S+SPRINGWOOD.pdf

Sample cutline file with all map boundaries can be downloaded from https://maps.ozultimate.com/wiki/downloads (direct link)

Best Answer

You need to specify the output compression type. Using (lossy) JPEG will get you a much smaller output tif (~12MB).

 gdalwarp -co compress=JPEG -co tiled=YES -ot Byte -t_srs EPSG:28356 -r cubic -cutline "9030-4S SPRINGWOOD.geojson" -crop_to_cutline -dstalpha "9030-4S SPRINGWOOD.pdf" "9030-4S SPRINGWOOD.tif"

Alternatively, a 3 stage process will get you a paletted (~9MB) image. There is a manual step, you'll need to figure out what value is outside the clipline after converting to palletted with rgb2pct.py to assign it to NoData.

gdalwarp -overwrite -co tiled=YES -ot Byte -t_srs EPSG:28356 -r cubic -cutline "9030-4S SPRINGWOOD.geojson" -crop_to_cutline -dstalpha "9030-4S SPRINGWOOD.pdf" "9030-4S SPRINGWOOD_warp.tif"
rgb2pct.py "9030-4S SPRINGWOOD_warp.tif" "9030-4S SPRINGWOOD_PCT.tif"
gdal_translate -co compress=LZW -a_nodata 234 -co tiled=YES "9030-4S SPRINGWOOD_PCT.tif" "9030-4S SPRINGWOOD.tif"

enter image description here

enter image description here

Related Question