I have a lot of rasters. Here is the typical gdalinfo output:
Size is 14058, 9940
Coordinate System is `'
Origin = (1521634.513864640844986,403292.969188707182184)
Pixel Size = (0.169386038323864,-0.169386038265657)
Metadata:
TIFFTAG_DATETIME=2019:11:18 08:53:19
TIFFTAG_RESOLUTIONUNIT=2 (pixels/inch)
TIFFTAG_SOFTWARE=Adobe Photoshop CS6 (Windows)
TIFFTAG_XRESOLUTION=72
TIFFTAG_YRESOLUTION=72
Image Structure Metadata:
INTERLEAVE=PIXEL
Corner Coordinates:
Upper Left ( 1521634.514, 403292.969)
Lower Left ( 1521634.514, 401609.272)
Upper Right ( 1524015.743, 403292.969)
Lower Right ( 1524015.743, 401609.272)
Center ( 1522825.128, 402451.121)
Band 1 Block=14058x1 Type=Byte, ColorInterp=Red
Band 2 Block=14058x1 Type=Byte, ColorInterp=Green
Band 3 Block=14058x1 Type=Byte, ColorInterp=Blue
Each file is around 400 mb, which is too much for me. How can I decrease the file size of each raster?
Best Answer
Gdal supports several compression methods for several raster types. You can check the gdal raster formats page for information on which compression methods are available to your file format.
Gdals default, the GeoTIFF, supports several compression methods, which are listed here.
An example of how to use this in python:
Using PREDICTOR
You can add the PREDICTOR creation_option. In my experience, older versions of all kinds of GIS software do not deal with this creation options well, so only use this when you know your data is going to be used with more recent gdal versions (>2.0).
Data scaling
If the data that you try to compress is Float and you are not interested in many digits after the comma, you could consider scaling your data from float to integer. Gdal supports the addition of scale and offset values to the metadata. This means that when your raster values are scaled from 0 to 1, you could scale them to 0 to 100 with gdal, by using the scaleParams option:
See the gdal_translate page for more information. When you open a file with scale and offset applied in QGIS, the scale and offset are automatically applied.
Lossy vs Lossless compression
When compressing, you have the option of compressing the data lossy or lossless. Lossy compression allows for some "mistakes" to be made while compressing, so that gdal can compress the data more efficiently. Lossless compression will compress the data so that all stored values can be retrieved exactly as they are uncompress. For more information of lossy vs lossless, see here. Notice that by default your data is compressed lossless.
To envoke lossy compression, you can use the DISCARD_LSB creation option (e.g. DISCARD_LSB=10). Notice, the higher the number, the more information is lost. Also notice that it is probably only wise to use this option when working with float data.
For more information on this creation option, see the gdal GeoTIFF driver page.