I want to compute ten days anomalies based on MODIS NDSI. The data is stored as a GeoTIFF. For this task I have two folders, one with the reference period of 2001-2020 and one with 2001-2022 for which I want the anomalies.
The problem is that the data is not grouped correctly I suppose. This is the anomaly for NDSI for the 1July which is totally wrong. In this period the only NDSI values will be from clouds from time to time due to cloud masking and only in mountainous regions, and not in such a wide regions.
Anyone got any idea or suggestions? I can provide data if that will be of intereset.
##files path
test_anom='path\to\files'
##list for ndsi geotiffs
snow_anom = []
for filename in glob(os.path.join(test_anom,'*.tif')):
snow_anom.append(filename)
##empty list for files inside the path to store names the names to create the time dim
times_anom=[]
for filename in glob(os.path.join(test_anom,'*.tif')):
data = filename[38:51]
data1=datetime.strptime(str(data[0:10]),'%d-%m-%Y')
times_anom.append(data1)
#time dimension array from names list and sorted out
time_da_anom = xr.DataArray(times_anom, [('time', times_anom)])
time_good_anom=time_da_anom.sortby(time_da_anom)
# Load in and concatenate all individual GeoTIFFs with correct time dim
geotiffs_anomaly = xr.concat([rioxarray.open_rasterio(i) for i in snow_anom],
dim=time_good_anom)
##transform geotiffs to dataset and rename band
geotiffs_anom1 = geotiffs_anomaly.to_dataset('band')
geotiffs_anom_ren = geotiffs_anom1.rename({1: 'NDSI'})
##assign year_month coords from where to groupby data
da_clim= geotiffs_anom_ren.assign_coords(year_month=timp_bun_anom.time.dt.strftime("%d-%m"))
da_total= geotiffs_tot_ren.assign_coords(year_month=timp_bun_tot.time.dt.strftime("%d-%m"))
##groupby year_month to create the anomalies
gb = da_anomaly.groupby("year_month")
clim = gb.mean(dim="time",skipna=True)
present=da1.groupby("year_month")
anomaly = prezent - clim
###export the anomalies
test=anomaly ["NDSI"]
for i in range(len(test)):
name=test[i]["time"].dt.strftime("%d_%m_%Y")
nume=str(name)
name_good=nume[40:50]
test[i].rio.to_raster('out\path'+name_good+".tif", encoding={'zlib':True})
Best Answer
The problem here to my approach was that the names of the files wasn't sorted and the time was, so the time dimension was assigned wrong. Also for memory wise processing I use open_mfdataset() instead of
rioxarray.open_rasterio()
.The correct approach will be: