[GIS] how to preserve color-interpretation when merging TIFFs and adding alpha band

colortablegdalgdalwarpgeotiff-tiff

We have a collection of black-and-white map tiles as *.tif images with accompanying *.tfw files, that we'd like to merge into one single large map and cut with a polygon (turning the area outside the polygon transparent).

gdalinfo output for them looks like this:

Driver: GTiff/GeoTIFF
Files: 702_232.tif
       702_232.tfw
Size is 4000, 4000
Coordinate System is `'
Origin = (702000.000000000000000,233000.000000000000000)
Pixel Size = (0.250000000000000,-0.250000000000000)
Metadata:
  TIFFTAG_RESOLUTIONUNIT=2 (pixels/inch)
  TIFFTAG_XRESOLUTION=96
  TIFFTAG_YRESOLUTION=96
Image Structure Metadata:
  COMPRESSION=LZW
  INTERLEAVE=BAND
  MINISWHITE=YES
Corner Coordinates:
Upper Left  (  702000.000,  233000.000) 
Lower Left  (  702000.000,  232000.000) 
Upper Right (  703000.000,  233000.000) 
Lower Right (  703000.000,  232000.000) 
Center      (  702500.000,  232500.000) 
Band 1 Block=4000x8 Type=Byte, ColorInterp=Palette
  Image Structure Metadata:
    NBITS=1
  Color Table (RGB with 2 entries)
    0: 255,255,255,255
    1: 0,0,0,255

So this seem to be 1-band TIFFs with a color palette.

To merge them, we're first building a VRT with

gdalbuildvrt -a_srs 'EPSG:21781' merged.vrt *.tif

(as you can see, we also set the source SRS, which is missing in the input data)

The individual input files (the tiles) and the VRT display fine (black and white, as intended) in QGIS 2.18.

To build an actual single image file (and at the same time cut the result to the polygon area1), we then use

gdalwarp \
    -cutline swissBOUNDARIES3D/FGDB/swissBOUNDARIES3D_1_3_LV03_LN02.gdb \
    -cl TLM_KANTONSGEBIET \
    -cwhere "NAME='St. Gallen'" \
    -dstalpha \
    \
    -co TILED=YES \
    -co COMPRESS=LZW \
    -co BIGTIFF=YES \
    \
    --config GDAL_CACHEMAX=40% \
    -multi \
    -et 0 \
    -wo NUM_THREADS=ALL_CPUS \
    -wm 5000 \
    \
    merged.vrt  merged_and_cut.tif

The result is a 2-band TIFF (the 2nd band being alpha, required for the transparency after cutting) without a color palette. When displayed in QGIS, it appears in bright green instead of black and white.

If we add -setci to the options for gdalwarp we get a 2-band TIFF with a (sensible looking, even though much larger) color palette. But in QGIS 2.18, it also appears in bright green by default, at least until go to layer properties → style and manually set "Render type" to "Palette".

As the original files and the VRT work fine, I wondered whether we can use the "Raw mode" documented at http://gdal.org/frmt_gtiff.html to preserve the first band's color interpretation, but there's no indication on how to use with gdalwarp. (Replacing merged.vrt by GTIFF_RAW:merged.vrt in the the gdalwarp invocation results in an error about the file not being found.)

How can I preserve the color interpretation with gdalwarp? (So that it is correctly understood by QGIS and other software.)

Example data

For licensing reasons I think I cannot share the tiles I want to merge, but I've produced these two example tiles (with nonsensical content) that have the same properties and show the same behavior as the original tiles:


1 Thus why we use gdalwarp. If we wouldn't cut to a polygon, gdaltranslate instead of gdalwarp would suffice to materialize the VRT to a single TIFF.

Best Answer

Whether or not this is a bug in QGIS, it can be worked around by handling trasparency with a NODATA value instead of by adding an alpha band. (Thanks to Andy Harfoot for the hint!)

As -cutline ... can only distinguish between inside and outside the queried vector feature(s), there is no need for semi-transparency / semi-opacity. Thus, a NODATA value will suffice and no alpha band is actually needed.

Because the input data only uses 2 of the 256 possible values for each byte that represents a pixel, we can choose a NODATA value that won't interfere with the colors already used. In the case at hand, it just cannot be 0 or 1. We are free to choose anything between 2 and 255.

With this, the command becomes

gdalwarp \
    -cutline swissBOUNDARIES3D/FGDB/swissBOUNDARIES3D_1_3_LV03_LN02.gdb \
    -cl TLM_KANTONSGEBIET \
    -cwhere "NAME='St. Gallen'" \
    -dstnodata 255 \  # <------------------------ this line was changed
    \
    -co TILED=YES \
    -co COMPRESS=LZW \
    -co BIGTIFF=YES \
    \
    --config GDAL_CACHEMAX=40% \
    -multi \
    -et 0 \
    -wo NUM_THREADS=ALL_CPUS \
    -wm 5000 \
    \
    merged.vrt  merged_and_cut.tif

Pros:

  • With the same compression applied, the resulting file (here cut to the boundary of canton Zug, 8.8 MB (direct download)) will be half as large as with an alpha band.

Cons:

  • I have to know in advance (or to detect somehow) whether the input tile set has RGB(A) 3- or 4-band images vs. paletted 1-band images and in the latter case determine a value that can be used as NODATA value without conflict. While this is easy to do "manually" by looking at gdalinfo output, I'd like to use this in an automated process that can take either type of input.