I am trying to write a large raster from a dask array to a GeoTIFF file, but am running into an issue where the resulting raster contains "stripes" of no data. The pixels that do contain data have the correct values.
The raster that I am trying to write has dimensions 97838×154221 pixels and the output is >4 GB. I tested on a smaller raster of size 6017×6401 pixels and chunk size of 20×20 and my code seemed to work fine. I have a feeling that this has to do with the "chunking" from the dask array, but I'm not sure. I'm also not sure if the way I am writing the raster is the most memory efficient way, but I have played around with a few different ways and this way seems to not throw a memory error.
This is a subset of what the resulting GeoTIFF raster looks like:
This is the code that I am using to write the GeoTiff file:
# Importing packages
import dask.array as da
import xarray as xr
import rasterio
import rioxarray
# Loading a template raster to use metadata when writing new raster to file
template_grid_filepath = "template.tif"
template_grid = rasterio.open(raw_template_grid_filepath)
template_grid_rx = rioxarray.open_rasterio(raw_template_grid_filepath)
#### Below is code used to create the array that will be written to a GeoTIFF ####
# Specifying paths of rasters to merge together
inputs = "inputs"
# NLCD 2001 to 2016
nlcd_years = [2001, 2006, 2011, 2016]
nlcd_files = [os.path.join(inputs, "nlcd-" + str(year) + ".tif") for year in nlcd_years]
# CCAP 2001 to 2016
ccap_years = [2001, 2006, 2010, 2016]
ccap_files = [os.path.join(inputs, "ccap-" + str(year) + ".tif") for year in ccap_years]
# Create layer_array dask array by merging information from ccap and nlcd rasters
iter = 1
for ccap_filepath, nlcd_filepath in zip(ccap_files, nlcd_files):
ccap = rioxarray.open_rasterio(ccap_filepath, chunks=10000, lock=True)
nlcd = rioxarray.open_rasterio(nlcd_filepath, chunks=10000, lock=True)
shape = ccap.x.shape[0]
ccap = da.Array.flatten(ccap)
nlcd = da.Array.flatten(nlcd)
merged = da.where(ccap == 0, nlcd, ccap)
merged = da.where(merged == 14, 13, merged)
merged = da.where(merged == 90, 13, merged)
merged = da.where(merged == 13, 1, 0)
with dask.config.set(**{'array.slicing.split_large_chunks': False}):
merged = merged.reshape(-1, shape)
if iter == 1:
layer_array = merged
else:
layer_array += merged
layer_array = da.where(layer_array > 0, 1, 0)
del ccap, nlcd, merged
iter += 1
##### Above is code used to create array to write to GeoTIFF #####
# Create metadata from template
out_meta = template_grid.meta
out_meta.update({'compress': 'lzw'})
# Convert layer array to xarray so I can write to raster using rioxarray
layer_array_xr = xr.DataArray(layer_array,
coords=template_grid_rx.coords,
dims=['band', 'y', 'x'],
attrs=template_grid_rx.attrs)
# Set CRS
layer_array_xr = layer_array_xr.rio.set_crs(template_grid_rx.rio.crs)
# Write layer array xarray to raster using rioxarray
layer_array_xr.rio.to_raster("output.tif", tiled=True, lock=True, **out_meta)
Best Answer
You need to pass in a specific lock such as
lock=threading.Lock()
orlock=dask.distributed.Lock("rio", client=client)
See: https://corteva.github.io/rioxarray/stable/examples/dask_read_write.html
Also see: https://github.com/corteva/rioxarray/issues/220