Xarray Groupby Anomalies – Wrong Results Using Xarray Groupby to Calculate Anomalies

pythonrioxarrayxarray

I want to compute ten days anomalies based on MODIS NDSI. The data is stored as a GeoTIFF. For this task I have two folders, one with the reference period of 2001-2020 and one with 2001-2022 for which I want the anomalies.

The problem is that the data is not grouped correctly I suppose. This is the anomaly for NDSI for the 1July which is totally wrong. In this period the only NDSI values will be from clouds from time to time due to cloud masking and only in mountainous regions, and not in such a wide regions.

Anyone got any idea or suggestions? I can provide data if that will be of intereset.
enter image description here

##files path
test_anom='path\to\files'
##list for ndsi geotiffs
snow_anom = []
for filename in glob(os.path.join(test_anom,'*.tif')):
   snow_anom.append(filename)

##empty list for files inside the path to store names the names to create the time dim
times_anom=[]
for filename in glob(os.path.join(test_anom,'*.tif')):
  data = filename[38:51]
  data1=datetime.strptime(str(data[0:10]),'%d-%m-%Y')
  times_anom.append(data1)

#time dimension array from names list and sorted out   
time_da_anom = xr.DataArray(times_anom, [('time', times_anom)])
time_good_anom=time_da_anom.sortby(time_da_anom)

# Load in and concatenate all individual GeoTIFFs with correct time dim
geotiffs_anomaly = xr.concat([rioxarray.open_rasterio(i) for i in snow_anom],
                    dim=time_good_anom)

##transform geotiffs to dataset and rename band 
geotiffs_anom1 = geotiffs_anomaly.to_dataset('band')
geotiffs_anom_ren = geotiffs_anom1.rename({1: 'NDSI'})

##assign year_month coords from where to groupby data 
da_clim= geotiffs_anom_ren.assign_coords(year_month=timp_bun_anom.time.dt.strftime("%d-%m"))
da_total= geotiffs_tot_ren.assign_coords(year_month=timp_bun_tot.time.dt.strftime("%d-%m"))

##groupby year_month to create the anomalies 
gb = da_anomaly.groupby("year_month")
clim = gb.mean(dim="time",skipna=True)
present=da1.groupby("year_month")
anomaly = prezent - clim

###export the anomalies
test=anomaly ["NDSI"]
for i in range(len(test)):
    name=test[i]["time"].dt.strftime("%d_%m_%Y")
    nume=str(name)
    name_good=nume[40:50]
    test[i].rio.to_raster('out\path'+name_good+".tif", encoding={'zlib':True})

Best Answer

The problem here to my approach was that the names of the files wasn't sorted and the time was, so the time dimension was assigned wrong. Also for memory wise processing I use open_mfdataset() instead of rioxarray.open_rasterio().

The correct approach will be:

##empty list for files inside the path to store names the names to create 
the time dim
times_anom=[]
for filename in glob(os.path.join(test_anom,'*.tif')):
    data1=datetime.strptime(str(data[0:10]),'%d-%m-%Y')
    times_anom.append(data1)

#time dimension array from names list and sorted out   
time_good_anom = xr.DataArray(times_anom, [('time', times_anom)])

# Load in and concatenate all individual GeoTIFFs with correct time dim
geotiffs_anomaly = xr.concat([xr.open_mfdataset(i) for i in snow_anom],
                dim=time_good_anom)

## rename band of dataset and skip .to_dataset transormation
geotiffs_anom_ren = geotiffs_anom1.rename({'band_data': 'NDSI'})