Xarray Climatology Mean – Xarray Mean for Climatologies Using min_count as in xarray sum

pythonxarray

Why there is not implemented in xr.mean() a parameter for minimum count of valid data accross arrays? As in xr.sum() for example. I have a certain situation where I want to compute climatologies and I've got some issues with abnormal data to say. For example, for the first day of August my rasters from where I create the xr.dataset() have values just for the 1-08-2015(created from MODIS NDSI) across the 20 years of data. It will be more facile if for example I had a min_count to get rid of this observation.

Best Answer

It's actually pretty facile to add this by yourself! :)

import xarray as xr
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

min_count = 5

# Create some test dataset with 10 dates and 5*5=25 pixels.
data = np.random.random((10,5,5))
data[:5,0,0] = np.nan
data[4:,2,2] = np.nan
data[0,4,4] = np.nan
data[2:9,4,0] = np.nan
data[:,0,4] = np.nan
ds = xr.Dataset({"my_var": (["time", "x", "y"], data)}, 
                coords = {"time": pd.date_range("2022-01-01", "2022-01-10"), 
                          "x": range(5), 
                          "y": range(5)})

# Count how many non-NaN values we have per pixel.
count = ds.count("time")

# Plot the counts.
ax = plt.figure(1).gca()
count.my_var.plot(ax = ax)

# Calculate the mean for pixels where there is sufficient data.
out = ds.where(count >= min_count).mean("time")

# Plot the means.
ax = plt.figure(2).gca()
out.my_var.plot(ax = ax)

The amount of valid values per pixel

The amount of valid values per pixel

The mean values for pixels with more valid values than min_count The mean values for pixels with more valid values than min_count

Related Question