Mask time series xarray dataset based on the other overlapping dataset

pythonxarray

I have Dataset1 that have time series of vegetation moisture values (fmc_mean) and Dataset2 that have vegetation cover types LC_Type1 (grass – 1, shrub – 2, forest -3) that overlaps with Dataset1.

Dataset1
  <xarray.Dataset>
Dimensions:       (time: 92, x: 138, y: 192)
Coordinates:
  * x             (x) float64 1.508e+06 1.509e+06 ... 1.576e+06 1.577e+06
  * y             (y) float64 -3.936e+06 -3.936e+06 ... -4.031e+06 -4.031e+06
  * time          (time) datetime64[ns] 2014-01-01 2014-01-05 ... 2014-12-31
    spatial_ref   int32 0
Data variables:
    fmc_mean      (time, y, x) float32 nan nan nan nan nan ... nan nan nan nan
    fmc_stdv      (time, y, x) float32 nan nan nan nan nan ... nan nan nan nan
    quality_mask  (time, y, x) float32 nan nan nan nan nan ... nan nan nan nan
    


   Dataset2
   xarray.Dataset
Dimensions: band:1 x:123 y:173
Coordinates:
y    (y)    float64    -3.941e+06 ... -4.027e+06
x    (x)    float64    1.513e+06 1.513e+06 ... 1.574e+06
band (band) int32      1
spatial_ref ()         int32   0
Data variables: 
LC_Type1   (band, y, x)   float64   nan nan nan nan ... nan nan nan nan

I want to mask Dataset1 based on the Dataset2, so only forest cover (LC_Type1==3) fmc_mean will be assigned to the new dataset Dataset_forest with following code:

Dataset_forest = Dataset1.where(Datset2.LC_Type1==3) # is this code correct?

But, it results empty dataset.

Dataset_forest
xarray.Dataset
Dimensions: band:1 time:92 x:0 y:0
Coordinates:
x    (x)    float64
y    (y)    float64
time (time) datetime64[ns] 2014-01-01 ... 2014-12-31
spatial_ref ()     int32 0
band        (band) int32 1
Data variables:
fmc_mean     (time, y, x, band) float32
fmc_stdv     (time, y, x, band) float32
quality_mask (time, y, x, band) float32

I re-projected coordinates to make the Datasets have the same coordinate system and they seem to overlap.
enter image description here
I am quite new to Python.

Best Answer

The coordinates have to match exactly for where to work. Since your x and y values are slightly different no matches can be found.

To fix this you can use interp() to interpolate one of the rasters to the same coordinate grid of the other raster:

time, x, y = Dataset1.indexes.values()
ds2_interp = Dataset2.interp(x=x, y=y, method="nearest")

This should work, I haven't tested it myself though. You might also have to add the time dimension for it to work.

Related Question