Writing GeoTIFF from dask array using rioxarray in python results in “stripes” of no data

daskgeotiff-tiffpythonrasteriorioxarray

I am trying to write a large raster from a dask array to a GeoTIFF file, but am running into an issue where the resulting raster contains "stripes" of no data. The pixels that do contain data have the correct values.

The raster that I am trying to write has dimensions 97838×154221 pixels and the output is >4 GB. I tested on a smaller raster of size 6017×6401 pixels and chunk size of 20×20 and my code seemed to work fine. I have a feeling that this has to do with the "chunking" from the dask array, but I'm not sure. I'm also not sure if the way I am writing the raster is the most memory efficient way, but I have played around with a few different ways and this way seems to not throw a memory error.

This is a subset of what the resulting GeoTIFF raster looks like:

enter image description here

This is the code that I am using to write the GeoTiff file:

# Importing packages
import dask.array as da
import xarray as xr
import rasterio
import rioxarray

# Loading a template raster to use metadata when writing new raster to file
template_grid_filepath = "template.tif"
template_grid = rasterio.open(raw_template_grid_filepath)
template_grid_rx = rioxarray.open_rasterio(raw_template_grid_filepath)

#### Below is code used to create the array that will be written to a GeoTIFF ####
# Specifying paths of rasters to merge together
inputs = "inputs"

# NLCD 2001 to 2016
nlcd_years = [2001, 2006, 2011, 2016]
nlcd_files = [os.path.join(inputs, "nlcd-" + str(year) + ".tif") for year in nlcd_years]

# CCAP 2001 to 2016
ccap_years = [2001, 2006, 2010, 2016]
ccap_files = [os.path.join(inputs, "ccap-" + str(year) + ".tif") for year in ccap_years]

# Create layer_array dask array by merging information from ccap and nlcd rasters
iter = 1
for ccap_filepath, nlcd_filepath in zip(ccap_files, nlcd_files):

    ccap = rioxarray.open_rasterio(ccap_filepath, chunks=10000, lock=True)
    nlcd = rioxarray.open_rasterio(nlcd_filepath, chunks=10000, lock=True)
    
    shape = ccap.x.shape[0]
    ccap = da.Array.flatten(ccap)
    nlcd = da.Array.flatten(nlcd)

    merged = da.where(ccap == 0, nlcd, ccap)
    merged = da.where(merged == 14, 13, merged)
    merged = da.where(merged == 90, 13, merged)
    merged = da.where(merged == 13, 1, 0)
    with dask.config.set(**{'array.slicing.split_large_chunks': False}):
        merged = merged.reshape(-1, shape)

    if iter == 1:
        layer_array = merged           
    else:
        layer_array += merged
        layer_array = da.where(layer_array > 0, 1, 0)
        

    del ccap, nlcd, merged
    iter += 1

##### Above is code used to create array to write to GeoTIFF #####

# Create metadata from template
out_meta = template_grid.meta
out_meta.update({'compress': 'lzw'})

# Convert layer array to xarray so I can write to raster using rioxarray
layer_array_xr = xr.DataArray(layer_array, 
                              coords=template_grid_rx.coords, 
                              dims=['band', 'y', 'x'],
                              attrs=template_grid_rx.attrs)

# Set CRS
layer_array_xr = layer_array_xr.rio.set_crs(template_grid_rx.rio.crs)

# Write layer array xarray to raster using rioxarray
layer_array_xr.rio.to_raster("output.tif", tiled=True, lock=True, **out_meta)

Best Answer

You need to pass in a specific lock such as lock=threading.Lock() or lock=dask.distributed.Lock("rio", client=client)

See: https://corteva.github.io/rioxarray/stable/examples/dask_read_write.html

Tips for using dask locks:

  • Be careful about what lock you use for your process. It is required to have a lock for each worker, so the more fine-grained the better.
  • The reading and writing processes need the same type of lock. They don’t have to share the same lock, but they do nead a lock of the same type.

Also see: https://github.com/corteva/rioxarray/issues/220