[GIS] how to preserve color-interpretation when merging TIFFs and adding alpha band

colortablegdalgdalwarpgeotiff-tiff

We have a collection of black-and-white map tiles as *.tif images with accompanying *.tfw files, that we'd like to merge into one single large map and cut with a polygon (turning the area outside the polygon transparent).

gdalinfo output for them looks like this:

Driver: GTiff/GeoTIFF
Files: 702_232.tif
       702_232.tfw
Size is 4000, 4000
Coordinate System is `'
Origin = (702000.000000000000000,233000.000000000000000)
Pixel Size = (0.250000000000000,-0.250000000000000)
Metadata:
  TIFFTAG_RESOLUTIONUNIT=2 (pixels/inch)
  TIFFTAG_XRESOLUTION=96
  TIFFTAG_YRESOLUTION=96
Image Structure Metadata:
  COMPRESSION=LZW
  INTERLEAVE=BAND
  MINISWHITE=YES
Corner Coordinates:
Upper Left  (  702000.000,  233000.000) 
Lower Left  (  702000.000,  232000.000) 
Upper Right (  703000.000,  233000.000) 
Lower Right (  703000.000,  232000.000) 
Center      (  702500.000,  232500.000) 
Band 1 Block=4000x8 Type=Byte, ColorInterp=Palette
  Image Structure Metadata:
    NBITS=1
  Color Table (RGB with 2 entries)
    0: 255,255,255,255
    1: 0,0,0,255

So this seem to be 1-band TIFFs with a color palette.

To merge them, we're first building a VRT with

gdalbuildvrt -a_srs 'EPSG:21781' merged.vrt *.tif

(as you can see, we also set the source SRS, which is missing in the input data)

The individual input files (the tiles) and the VRT display fine (black and white, as intended) in QGIS 2.18.

To build an actual single image file (and at the same time cut the result to the polygon area¹), we then use

gdalwarp \
    -cutline swissBOUNDARIES3D/FGDB/swissBOUNDARIES3D_1_3_LV03_LN02.gdb \
    -cl TLM_KANTONSGEBIET \
    -cwhere "NAME='St. Gallen'" \
    -dstalpha \
    \
    -co TILED=YES \
    -co COMPRESS=LZW \
    -co BIGTIFF=YES \
    \
    --config GDAL_CACHEMAX=40% \
    -multi \
    -et 0 \
    -wo NUM_THREADS=ALL_CPUS \
    -wm 5000 \
    \
    merged.vrt  merged_and_cut.tif

The result is a 2-band TIFF (the 2nd band being alpha, required for the transparency after cutting) without a color palette. When displayed in QGIS, it appears in bright green instead of black and white.

If we add -setci to the options for gdalwarp we get a 2-band TIFF with a (sensible looking, even though much larger) color palette. But in QGIS 2.18, it also appears in bright green by default, at least until go to layer properties → style and manually set "Render type" to "Palette".

As the original files and the VRT work fine, I wondered whether we can use the "Raw mode" documented at http://gdal.org/frmt_gtiff.html to preserve the first band's color interpretation, but there's no indication on how to use with gdalwarp. (Replacing merged.vrt by GTIFF_RAW:merged.vrt in the the gdalwarp invocation results in an error about the file not being found.)

How can I preserve the color interpretation with gdalwarp? (So that it is correctly understood by QGIS and other software.)

Example data

For licensing reasons I think I cannot share the tiles I want to merge, but I've produced these two example tiles (with nonsensical content) that have the same properties and show the same behavior as the original tiles:

Inputs (TIFFs and TFWs): ownCloud directory | download as ZIP archive
gdalbuildvrt -a_srs 'EPSG:21781' merged.vrt ../input/*.tif results in merged.vrt (2 KB) (direct download)
gdalwarp -dstalpha -co TILED=YES -co COMPRESS=LZW merged.vrt merged.tif results in merged.tif (17.6 MB) (direct download)

¹ Thus why we use gdalwarp. If we wouldn't cut to a polygon, gdaltranslate instead of gdalwarp would suffice to materialize the VRT to a single TIFF.

Best Answer

Whether or not this is a bug in QGIS, it can be worked around by handling trasparency with a NODATA value instead of by adding an alpha band. (Thanks to Andy Harfoot for the hint!)

As -cutline ... can only distinguish between inside and outside the queried vector feature(s), there is no need for semi-transparency / semi-opacity. Thus, a NODATA value will suffice and no alpha band is actually needed.

Because the input data only uses 2 of the 256 possible values for each byte that represents a pixel, we can choose a NODATA value that won't interfere with the colors already used. In the case at hand, it just cannot be 0 or 1. We are free to choose anything between 2 and 255.

With this, the command becomes

gdalwarp \
    -cutline swissBOUNDARIES3D/FGDB/swissBOUNDARIES3D_1_3_LV03_LN02.gdb \
    -cl TLM_KANTONSGEBIET \
    -cwhere "NAME='St. Gallen'" \
    -dstnodata 255 \  # <------------------------ this line was changed
    \
    -co TILED=YES \
    -co COMPRESS=LZW \
    -co BIGTIFF=YES \
    \
    --config GDAL_CACHEMAX=40% \
    -multi \
    -et 0 \
    -wo NUM_THREADS=ALL_CPUS \
    -wm 5000 \
    \
    merged.vrt  merged_and_cut.tif

Pros:

With the same compression applied, the resulting file (here cut to the boundary of canton Zug, 8.8 MB (direct download)) will be half as large as with an alpha band.

Cons:

I have to know in advance (or to detect somehow) whether the input tile set has RGB(A) 3- or 4-band images vs. paletted 1-band images and in the latter case determine a value that can be used as NODATA value without conflict. While this is easy to do "manually" by looking at gdalinfo output, I'd like to use this in an automated process that can take either type of input.

Edit

This may be closer to what you want:

gdalbuildvrt -srcnodata 255 255 255 -hidenodata temp.vrt <warped files>.tif

Finally, you can call gdal2tiles.py as you have it.

[GIS] Compression Artifacts and GDAL

TL;DR The short version is to use a command like the following one which will make you as happy as it is possible with GDAL.

gdalwarp -of gtiff  -t_srs epsg:26912 -srcnodata 0 -dstnodata none -dstalpha -co compress=jpeg -co tiled=yes test.tif final.tif

Longer story

Your problem is not really in the compression artifacts but how to handle transparency with JPEG compressed TIFF files. It looks also that you have done right things and probably you have already read this thread from GDAL-devel mailing list http://osgeo-org.1560.x6.nabble.com/gdal-dev-Creating-jpeg-compressed-tiff-with-alpha-td4985300.html

I made a few tests with current GDAL development version 2.0-dev and it appears that the trick which are mentioned in the thread are mostly implemented into GDAL now in rather uses friendly way.

The following screenshots hopefully clarify what happens and what is doable with GDAL.

First image shows a detail from the border between data and nodata in warped and jpeg compressed image. I wouldn't say that there are any visible artifacts.

enter image description here

However, when setting the viewer to make black pixels transparent so that the blue background shows there are for sure nasty pixels. They appear because jpeg compression is lossy and some nodata pixels which were originally totally black (RGB 0 0 0) are only nearly black after compression, like RGB 0 0 1.

enter image description here

Compare the result with lossless LZW compression which keeps black as black. Compare also file sizes: LZW compressed 848 MB vs. 246 MB as JPEG compressed tiff. Note that the visual quality on the data area is about the same and the nasty things happen only at the border and if image is set to transparent.

The truth is that transparency can't be handled with transparent pixels with JPEG compression not other lossy compression methods. The mechanism that works is to attach to image an additional band that keeps the information about transparency. Transparency can be stored into an alpha channel or into auxiliary mask which can be even in a separate file. Both methods can be used with GDAL but unfortunately in your case neither of them lead to perfect result.

enter image description here

First shot is to convert nodata pixels into an alpha band with lossless method and compress the result into jpeg compressed tiff. The result is rather good but not perfect because as described in the mailing list thread, GDAL is compressing also the alpha channel with lossy jpeg method. However, while with nodata pixels the transparency is either zero of one but with alpha channel on range 0-255 the effect of lossy compression is not so bad.

enter image description here

The other method that GDAL supports is to write the data/nodata infromation into separate mask band. This info takes only 1 bit to store and it can be effectively compressed so the payload in disk space is minimal, in this case 1 MB extra for a 26855 by 23536 pixels sized warped image. The only drawback is that I do not know any viewer that supports the GDAL external mask layers. Not ever QGIS which is heavily based on GDAL with image handling understands them. There seems to a related ticket but that does not mention external masks http://hub.qgis.org/issues/3348.

Example data

Best Answer

Pros:

Cons:

Related Solutions

[GIS] Tiling rgba geotiffs using vrt rather than gdal_warp

Edit

[GIS] Compression Artifacts and GDAL

Related Question