GDAL File Size – Is File Size Inflation Normal with GDALwarp?

coordinate systemfile sizegdalgdalwarpgeotiff-tiff

After using gdalwarp to project and align-to-grid (via -tap) a number of rasters I noticed that the output rasters were significantly larger than the original rasters. A fairly thorough web search turned up this Trac issue:

Frank Warmerdam explained the reason:

"On careful review, the difference in the file in question is because gdal_translate uses the TIFFWriteScanline() interface to write the output file from within GTiffDataset::CreateCopy?(), and this only writes as much of the final 'strip' of the file as is required to complete image area. But gdalwarp goes through the blockio interface which writes the complete final strip, even the portion that falls off the end of the file."

This Trac issue is ~7 years old, however, and I know some changes to the GDAL utilities, including gdalwarp have been made since. I'd like to know if the above reasoning still holds and if the file size inflation I'm seeing is "normal." The word "normal" here might be taken to mean unsurprising or expected but, more importantly: is there anything that can be done to mitigate the effects i.e. reduce the output raster file size? Below is a table of the file size inflation I'm experiencing.

Input File Size (bytes)     Output File Size (bytes)    Inflation
1437380431                  1698334217                   18%
1428001178                  1698334433                   19%
  41683165                   137036637                  228%

The input TIFF files were created in ArcGIS and thus have external Worldfiles, XML and DBF files but these do not make up the difference in file size. Here is a sample gdalwarp call as I've used it in all of these cases; the actual execution was handled by a Python subprocess (subprocess.Popen):

$ gdalwarp -tap -tr 30 30 -t_srs "+proj=aea +lat_1=20 +lat_2=60 +lat_0=40 +lon_0=-96 +x_0=0 +y_0=0 +ellps=GRS80 +datum=NAD83 +units=m +no_defs" -co "COMPRESS=LZW" input_file.tif output_file.tif

I understand that in rare cases compression makes a larger file, but the effect is the same without the LZW compression. The ratios in the table are with LZW compression.

Best Answer

It's a well known and longstanding issue that gdalwarp doesn't deal with compression well. The solution is to gdalwarp without compression then gdal_translate with compression.

To avoid two lengthy processes, gdalwarp to VRT first, it's really quick, then gdal_translate with the -co compress=lzw option.

i.e.

$ gdalwarp -tap -tr 30 30 -t_srs "etc..." -of vrt input_file.tif output_file.vrt
$ gdal_translate -co compress=LZW output_file.vrt output_file.tif

If using GDAL 2x you can combine this into a single operation by writing the VRT to /vsistdout and piping that to gdal_translate and specifying /vsistdin as the input. For example:

gdalwarp -q -t_srs EPSG:32611 -of vrt input_file.tif /vsistdout/ | gdal_translate -co compress=lzw  /vsistdin/ output_file.tif
Related Question