Writing GeoTIFF from dask array using rioxarray in python results in “stripes” of no data

daskgeotiff-tiffpythonrasteriorioxarray

I am trying to write a large raster from a dask array to a GeoTIFF file, but am running into an issue where the resulting raster contains "stripes" of no data. The pixels that do contain data have the correct values.

The raster that I am trying to write has dimensions 97838×154221 pixels and the output is >4 GB. I tested on a smaller raster of size 6017×6401 pixels and chunk size of 20×20 and my code seemed to work fine. I have a feeling that this has to do with the "chunking" from the dask array, but I'm not sure. I'm also not sure if the way I am writing the raster is the most memory efficient way, but I have played around with a few different ways and this way seems to not throw a memory error.

This is a subset of what the resulting GeoTIFF raster looks like:

This is the code that I am using to write the GeoTiff file:

# Importing packages
import dask.array as da
import xarray as xr
import rasterio
import rioxarray

# Loading a template raster to use metadata when writing new raster to file
template_grid_filepath = "template.tif"
template_grid = rasterio.open(raw_template_grid_filepath)
template_grid_rx = rioxarray.open_rasterio(raw_template_grid_filepath)

#### Below is code used to create the array that will be written to a GeoTIFF ####
# Specifying paths of rasters to merge together
inputs = "inputs"

# NLCD 2001 to 2016
nlcd_years = [2001, 2006, 2011, 2016]
nlcd_files = [os.path.join(inputs, "nlcd-" + str(year) + ".tif") for year in nlcd_years]

# CCAP 2001 to 2016
ccap_years = [2001, 2006, 2010, 2016]
ccap_files = [os.path.join(inputs, "ccap-" + str(year) + ".tif") for year in ccap_years]

# Create layer_array dask array by merging information from ccap and nlcd rasters
iter = 1
for ccap_filepath, nlcd_filepath in zip(ccap_files, nlcd_files):

    ccap = rioxarray.open_rasterio(ccap_filepath, chunks=10000, lock=True)
    nlcd = rioxarray.open_rasterio(nlcd_filepath, chunks=10000, lock=True)
    
    shape = ccap.x.shape[0]
    ccap = da.Array.flatten(ccap)
    nlcd = da.Array.flatten(nlcd)

    merged = da.where(ccap == 0, nlcd, ccap)
    merged = da.where(merged == 14, 13, merged)
    merged = da.where(merged == 90, 13, merged)
    merged = da.where(merged == 13, 1, 0)
    with dask.config.set(**{'array.slicing.split_large_chunks': False}):
        merged = merged.reshape(-1, shape)

    if iter == 1:
        layer_array = merged           
    else:
        layer_array += merged
        layer_array = da.where(layer_array > 0, 1, 0)
        

    del ccap, nlcd, merged
    iter += 1

##### Above is code used to create array to write to GeoTIFF #####

# Create metadata from template
out_meta = template_grid.meta
out_meta.update({'compress': 'lzw'})

# Convert layer array to xarray so I can write to raster using rioxarray
layer_array_xr = xr.DataArray(layer_array, 
                              coords=template_grid_rx.coords, 
                              dims=['band', 'y', 'x'],
                              attrs=template_grid_rx.attrs)

# Set CRS
layer_array_xr = layer_array_xr.rio.set_crs(template_grid_rx.rio.crs)

# Write layer array xarray to raster using rioxarray
layer_array_xr.rio.to_raster("output.tif", tiled=True, lock=True, **out_meta)

Best Answer

You need to pass in a specific lock such as lock=threading.Lock() or lock=dask.distributed.Lock("rio", client=client)

See: https://corteva.github.io/rioxarray/stable/examples/dask_read_write.html

Tips for using dask locks:

Be careful about what lock you use for your process. It is required to have a lock for each worker, so the more fine-grained the better.

The reading and writing processes need the same type of lock. They don’t have to share the same lock, but they do nead a lock of the same type.

Also see: https://github.com/corteva/rioxarray/issues/220

Related Solutions

[GIS] Converting NetCDF dataset array to GeoTiff using rasterio Python

I would recommend looking into rioxarray for your dataset.

You can open your dataset like so:

import rioxarray
import xarray

xds = xarray.open_dataset('D:\Weather data\et_01012018.nc')

If your CRS is not discovered, you should be able to add it like so:

xds.rio.write_crs("epsg:4326", inplace=True)

Then, you should be able to create a geotiff from the Evapotranspiration like so:

xds["Evapotranspiration"].rio.to_raster('D:\Weather data\test.tif')

If this does not produce the correct results, I would be interested in learning more about your input file data.

[GIS] python-gdal write a GeoTIFF in colour from a data array

You can create a color ramp for the color table instead of having to specify a color for every single value in the raster. The CreateColorRamp() method takes four arguments:

Start value
Start color
Stop value
Stop color

For example, imagine you want to build a color ramp with three colors (e.g. three different shades of green). The first green will be for the value 0, the second for the value 127 and the third for the value 254. The colors for the rest of the values will be interpolated colors between two of the already mentioned colors. Feel free to toy with this example and add as many stops as you want and change the colors depending on your needs.

colors = gdal.ColorTable()
colors.CreateColorRamp(0, (229, 245, 249), 127, (153, 216, 201))
colors.CreateColorRamp(127, (153, 216, 201), 254, (44, 162, 95))
band.SetRasterColorTable(colors)
band.SetRasterColorInterpretation(gdal.GCI_PaletteIndex)

On another note, it doesn't make sense to set the NoData value to -9999 if you have a TIFF file with gdal.GDT_Byte as its data type. A byte can only store positive integers between 0 and 255. Consider using another value for NoData or changing the data type of the TIFF file so it can actually store -9999.

Best Answer

Related Solutions

[GIS] Converting NetCDF dataset array to GeoTiff using rasterio Python

[GIS] python-gdal write a GeoTIFF in colour from a data array

Related Question