Raster conversion from NetCDF to GeoTIFF changes data values

netcdfpythonrasterrasteriorioxarray

I am trying to convert a NetCDF file to a GeoTIFF. I've followed the steps outlined in Converting NetCDF dataset array to GeoTiff using rasterio Python and everything works fine. However, when looking at the underlying data, the values increase by an order of magnitude when the data is converted to a GeoTIFF. Here is an example using the following gridMET precipitation data (https://www.northwestknowledge.net/metdata/data/pr_2016.nc):

import xarray
import rasterio as rio
import numpy as np
import matplotlib.pyplot as plt
# import rioxarray # Not used but package required to run script
# import netCDF4 # Not used but package required to run script

# read in NetCDF
xds = xarray.open_dataset('path/to/pr_2016.nc')

# Write first band to .tif
xds['precipitation_amount'][0].rio.to_raster('./pr_example.tif')

# Read new data with rasterio and convert nodata value
with rio.open('pr_example.tif') as tmp:
    rio_arr = tmp.read(1)
    rio_arr = np.where(rio_arr == 32767, np.nan, rio_arr)

# Look at difference between two arrays. 
# If they were identical, all value should be zero. 
xds_arr = xds['precipitation_amount'][0].values
diff = rio_arr - xds_arr

# Difference of ~500 mm/day between the two arrays
plt.imshow(diff)
plt.colorbar()

Difference between plots

In the resulting plot, all regions of the U.S. that had non-zero precipitation have values that are ~10x greater in the .tiff relative to the .nc file.

Does anyone know why this might be happening?

Best Answer

As @user30184 pointed out, there is a scale factor that is applied to the NetCDF when it is read with xarray. When opening the .tiff with rasterio, this scaling factor must be multiplied with the array to properly scale the values:

# Read new data with rasterio and convert nodata value
with rio.open('pr_example.tif') as tmp:
    rio_arr = tmp.read(1)
    rio_arr = np.where(rio_arr == 32767, np.nan, rio_arr)
    # Multiply by scaling factor
    rio_arr = rio_arr * tmp.scales