GDAL – How to Decrease Raster File Size

gdalraster

I have a lot of rasters. Here is the typical gdalinfo output:

Size is 14058, 9940
Coordinate System is `'
Origin = (1521634.513864640844986,403292.969188707182184)
Pixel Size = (0.169386038323864,-0.169386038265657)
Metadata:
  TIFFTAG_DATETIME=2019:11:18 08:53:19
  TIFFTAG_RESOLUTIONUNIT=2 (pixels/inch)
  TIFFTAG_SOFTWARE=Adobe Photoshop CS6 (Windows)
  TIFFTAG_XRESOLUTION=72
  TIFFTAG_YRESOLUTION=72
Image Structure Metadata:
  INTERLEAVE=PIXEL
Corner Coordinates:
Upper Left  ( 1521634.514,  403292.969)
Lower Left  ( 1521634.514,  401609.272)
Upper Right ( 1524015.743,  403292.969)
Lower Right ( 1524015.743,  401609.272)
Center      ( 1522825.128,  402451.121)
Band 1 Block=14058x1 Type=Byte, ColorInterp=Red
Band 2 Block=14058x1 Type=Byte, ColorInterp=Green
Band 3 Block=14058x1 Type=Byte, ColorInterp=Blue

Each file is around 400 mb, which is too much for me. How can I decrease the file size of each raster?

Best Answer

Gdal supports several compression methods for several raster types. You can check the gdal raster formats page for information on which compression methods are available to your file format.

Gdals default, the GeoTIFF, supports several compression methods, which are listed here.

An example of how to use this in python:

import gdal
infn = '/path/to/infile.tif'
outfn = '/path/to/outfile.tif'

ds = gdal.Translate(outfn, infn, creationOptions=["COMPRESS=LZW", "TILED=YES"])
ds = None

Using PREDICTOR

You can add the PREDICTOR creation_option. In my experience, older versions of all kinds of GIS software do not deal with this creation options well, so only use this when you know your data is going to be used with more recent gdal versions (>2.0).

import gdal

# when data is float, use 3, when data is int use 2
predictor = "3"
creation_options = ["COMPRESS=LZW", "TILED=YES", "PREDICTOR=" + predictor]
ds = gdal.Translate(outfn, infn, creationOptions=creation_options)
ds = None

Data scaling

If the data that you try to compress is Float and you are not interested in many digits after the comma, you could consider scaling your data from float to integer. Gdal supports the addition of scale and offset values to the metadata. This means that when your raster values are scaled from 0 to 1, you could scale them to 0 to 100 with gdal, by using the scaleParams option:


import gdal
scale_params = [[0,1,0,100]] # src_in, src_out, dst_in, dst_out
output_type = gdal.GDT_Byte
ds = gdal.Translate(outfn, infn, scaleParams=scale_params, outputType=output_type, 
    creationOptions=["COMPRESS=LZW", "TILED=YES"])
band = ds.GetRasterBand(1)
# add the scale and offset in the metadata so that you don't have to remember this yourself. 
band.SetScale(100)
band.SetOffset(0)
band.FlushCache()
band = None
ds.FlushCache()
ds = None

See the gdal_translate page for more information. When you open a file with scale and offset applied in QGIS, the scale and offset are automatically applied.

Lossy vs Lossless compression

When compressing, you have the option of compressing the data lossy or lossless. Lossy compression allows for some "mistakes" to be made while compressing, so that gdal can compress the data more efficiently. Lossless compression will compress the data so that all stored values can be retrieved exactly as they are uncompress. For more information of lossy vs lossless, see here. Notice that by default your data is compressed lossless.

To envoke lossy compression, you can use the DISCARD_LSB creation option (e.g. DISCARD_LSB=10). Notice, the higher the number, the more information is lost. Also notice that it is probably only wise to use this option when working with float data.

import gdal
ds = gdal.Translate(outfn, infn, creationOptions=["COMPRESS=LZW", "TILED=YES", "DISCARD_LSB=10"])
ds = None

For more information on this creation option, see the gdal GeoTIFF driver page.

Related Question