Create White Nodata Area in a Resampled Orthophoto using GDAL

gdalraster

I have a set of orthophoto images which are each fully filled with data, but when placed together the coverage does not form a rectangle so there are nodata areas.

In MapGuide, I need to be able to display these nodata areas as white so that my users don't have to waste black ink when printing.

At large scale, where I display original full resolution images directly, this is no problem. I just set my map's background colour to white, and the areas where there are no orthophotos display the background.

For performance, I need to be able to merge all of these source images into a resampled composite overview image for display at smaller scales where more of the orthophotos are viewable at once.

I've been attempting to use GDAL to merge and resample the overview, but by default it appears to create the resampled composite GeoTIFF tile with black in the nodata areas, and MapGuide does not allow me to set black as transparent on colour rasters.

Is there a way for me to efficiently get what I want?

I have provided the answer I ended up with using GDAL, but would love to see solutions using other image processing utilities and GIS applications, both open source and proprietary.

Best Answer

The easiest way for me to deal with this problem was to use the GDAL Virtual Format. This format allowed me to treat the entire set of images as a single image object, and transform it in three relatively simple steps.

Creating the Virtual Dataset

GDAL (including Tamas Szekeres' GISInternals Windows binaries and recent versions of OSGeo4W) includes a utility called gdalbuildvrt which can be used to build an initial virtual dataset.

One simple way of using this is to add all of your images to a text file, and then use that text file as an input to gdalbuildvrt. Here's an example (you'll need to put the second command back on one line):

dir /b *.tif > my_images.txt
gdalbuildvrt 
  -hidenodata 
  -vrtnodata "255 255 255" 
  -resolution highest 
  -input_file_list my_images.txt 
  my_image.vrt

This will leave you with an XML file which you can treat as a single image for all GDAL operations. It also internally represents nodata as white, but hides the nodata definition from tools reading from it.

Creating the Resampled Overview

Next, you will perform the resampling and output of the overview image. You can do this with either gdal_translate or gdalwarp. For either of these, remember that the resultant size will be width * height * 3 (number of 8 bit bands) Bytes. If this will be larger than 4GB, you will want to look at the GeoTIFF options for the syntax to specify BigTIFF as your output (-co "BIGTIFF=YES").

For gdal_translate, you will need to determine the virtual image's dimensions using the handy gdalinfo command. Take these dimensions and divide each by some consistent factor to determine the output width and height of your file in pixels.

The command will looks something like (on one line):

gdal_translate
  -outsize 53120 14000
  -co "TILED=YES"
  -co "PROFILE=GEOTIFF"
  -co "BLOCKXSIZE=256"
  -co "BLOCKYSIZE=256"
  my_image.vrt
  my_image.tif

For gdalwarp, you'll need to know the resultant pixelsize; in this case I'm using .5 metre. You'll also want to make a call on resampling method. I prefer cubicspline for orthophoto overviews. It's softer, but you're not going to be using these down to full resolution and in my experience it creates a more compressible image if you're using something like JPEG or ECW.

gdalwarp 
  -r cubicspline 
  -of GTiff 
  -dstnodata "255 255 255" 
  -tr 0.5 0.5 
  -co "PROFILE=GEOTIFF" 
  -co "BIGTIFF=YES" 
  -co "TILED=YES" 
  -co "BLOCKXSIZE=256"
  -co "BLOCKYSIZE=256"
  my_image.vrt 
  my_image.tif

You may also want to consider using JPEG compression options for these resampled GeoTIFF overviews; it shrinks the output file considerably with (according to Frank) only a marginal performance penalty.

  -co "COMPRESS=JPEG" 
  -co "JPEG_QUALITY=80" 
  -co "PHOTOMETRIC=YCBCR"

Overviews

You will also want to run the handy gdaladdo command over the resultant image to build internal "pyramids", so that requests for lower resolutions than the full image dimensions can be met with a subset of data. The increase in performance is more than worth the disk space in most cases. You'll want to play around with the levels that you use here; for very large images you may be able to drop some. The gdaladdo command looks something like this:

gdaladdo 
  -r average 
  my_image.tif 
  2 4 8 16 32 64 128 256

I would suggest experimenting with these levels for optimal performance. You may find that a different resampling interval is better for your applicaition or, based on your image size, that you can drop some of the higher numbers (or that more are needed)

Also, if you are generating an external overview (using the -ro option) consider adding the JPEG compression configuration lines:

  --config COMPRESS_OVERVIEW JPEG 
  --config PHOTOMETRIC_OVERVIEW YCBCR 
  --config INTERLEAVE_OVERVIEW BAND 

(I believe that these are inherited from the parent GeoTIFF for embedded overviews)

Notes

When faced with this problem, I asked on the #gdal channel on freenode.irc.net. This is an amazing resource, and I am thoroughly indebted to Howard Butler, Frank Warmerdam, and Even Rouault for helping me out with this.

Related Question