QGIS 3 – Optimal Method to Convert Raster from 32bit Float to 8bit Byte

formatqgisqgis-3raster

I have a raster file (.tif) that contains integer values from 1 to 31. I have realized that the values are stored as Float32 with NoData Value=-3.39999999999999996e+38 (see raster information below). I think that the optimal method to store the data is as unsigned 8 bit integer (QGIS Byte) as there are no decimal values and the maximum value is below 255. I have used the Translate (convert format) tool available from GDAL to change the data type from Float32 to Byte (following these instructions). However, I have run into two issues:

  1. The resulting file is BIGGER than my original file (from 31MB to 268MB). I noticed that the original file had COMPRESSION=LZW and tried to use various compression formats under Profile (i.e. default, no compression, Low compression, High compress, JPEG compression) but I was not successful in reducing file size while maintaining data quality (e.g. using JPEGwith 100 resulted in some values being +/-1 of the values they should be and some random additional pixels)
  2. I had difficulty setting the NoData value. It seems that 0 worked in my case but I may have files with 0-values that are meaningful and I wouldn't know how to convert the multidecimal NoData value from the Float to an integer value that is not confused with actual zero values in the dataset.

Metadata of original file

Size is 21768, 12920
Pixel Size = (250.000000000000000,-250.000000000000000)
            Image Structure Metadata:
              COMPRESSION=LZW
                  NoData Value=-3.39999999999999996e+38

UPDATE:
According to this related post, the raster size is computed based on the following formula:

The size of a raster is just the product of bit-depth/8, bands, rows,
and columns plus header metadata (statistics, etc.)

32/8*   1*  21768*  12920*  =1124970240 Bytes/(1024*1024) ~1072MB
8/8*    1*  21768*  12920*  =281242560  Bytes/(1024*1024) ~268MB

That gives me the size Windows indicates for my 8-bit raster without compression.

These are the options in the dialog box:

enter image description here

Best Answer

  1. The resulting file is bigger because it has no compression. JPEG compression resulted in image degradation because it is a lossy compression method, LZW and DEFLATE are lossless (input value will always = output value). Lossy compression methods are good for data where filesize is more important than absolute data integrity, such as a stretched aerial photo purely for display.

  2. You can set any value (within your data type range) as NoData, you don't have to use 0. In the example below, I use gdalbuildvrt to change the NoData value from -3.4e+38 to 255 which is within the Byte data range 0-255. (Note the -3.39999999999999996e+38 is just because gdalinfo is printing -3.4e+38 with floating point representation error, your NoData is -3.4e+38 which is the minimum value of the Float32 range)

The options for GDAL Translate (convert format) in the GUI window are somewhat hidden. To use the LZW compression method (or other lossless method), choose Profile : Default and then add a compression method with the big green + and enter manually COMPRESS under Name and LZW under Value.

For full functionality of GDAL Translate, use OSGeo4W Shell in Windows (or a terminal shell in Linux/MacOS). It is opened separately from QGIS Desktop, a different program available from the start-up menu. Prior to running the code below, set the path for files with cd followed by path (e.g. cd "C:/Users/yourname/gis files").

E.g (in Linux/MacOS Bash shell):

# Create an LZW compressed version
gdal_translate -co compress=LZW -co tiled=YES test.tif test_lzw.tif

# Convert NoData to 255
# the first file with .vrt is the virtual output, the second is the input
gdalbuildvrt -srcnodata -3.4E+38 -vrtnodata 255 test.vrt test.tif

# Write out a Byte with LZW compression TIFF file
# the first file with .vrt is the input (from buildvrt), the second is the output
gdal_translate -co compress=LZW -co tiled=YES -ot Byte test.vrt test_byte_lzw.tif

# Write out a Byte with no compression TIFF file
gdal_translate -co tiled=YES -ot Byte test.vrt test_byte.tif

# Check file info
gdalinfo test.tif -stats
gdalinfo test_byte_lzw.tif -stats

# Check file size
du -sh test.tif  # Uncompressed
du -h test_lzw.tif
du -h test_byte.tif  # Uncompressed 
du -h test_byte_lzw.tif  #My file uses random 0-31 values so will always be much bigger compressed than "real" data which compresses better.

Output:

Driver: GTiff/GeoTIFF
Files: test.tif
       test.tif.aux.xml
Size is 21768, 12920
<snip>
Band 1 Block=21768x1 Type=Float32, ColorInterp=Gray
  Min=0.000 Max=20.000 
  Minimum=0.000, Maximum=20.000, Mean=9.999, StdDev=8.165
  NoData Value=-3.39999999999999996e+38

Driver: GTiff/GeoTIFF
Files: test_byte_lzw.tif
Size is 21768, 12920
<snip>
Band 1 Block=256x256 Type=Byte, ColorInterp=Gray
  Minimum=0.000, Maximum=20.000, Mean=9.999, StdDev=8.165
  NoData Value=255

1.1G    test.tif
99M test_lzw.tif
275M    test_byte.tif
64M test_byte_lzw.tif

E.g (in OSGeo4W shell):

REM Create an LZW compressed version
gdal_translate -co compress=LZW -co tiled=YES test.tif test_lzw.tif

REM Convert NoData to 255
REM the first file with .vrt is the virtual output, the second is the input
gdalbuildvrt -srcnodata -3.4E+38 -vrtnodata 255 test.vrt test.tif

REM Write out a Byte with LZW compression TIFF file
REM the first file with .vrt is the input (from buildvrt), the second is the output
gdal_translate -co compress=LZW -co tiled=YES -ot Byte test.vrt test_byte_lzw.tif

REM Write out a Byte with no compression TIFF file
gdal_translate -co tiled=YES -ot Byte test.vrt test_byte.tif

REM Check file info
gdalinfo test.tif -stats
gdalinfo test_byte_lzw.tif -stats

REM Check file size
dir test.tif 
dir test_lzw.tif
dir test_byte.tif
dir test_byte_lzw.tif
Related Question