Google Earth Engine – Forcing fastDistanceTransform to Operate at Specific Scale

google-earth-enginegoogle-earth-engine-javascript-api

I am trying to compute the distance to the nearest waterbodies at a global scale. I am defining waterbodies by thresholding the JRC's Global Surface Water "occurrence" dataset at 80% or greater. This is easy enough to do with fastDistanceTransform(), but the problem is that I need to perform the distance transform at the native scale of the GSW data (30 meters). Ultimately I want to export at a scale of 1 km, but I want the resampling to occur on the distance-transformed image, not the GSW data. This is because coarsening the GSW data can lead to the disappearance of small waterbodies, which greatly alters the final distance transform.

So to be clear, I'd like to:

  1. Threshold GSW at 80% to identify water pixels.
  2. Compute distance transform on thresholded image from (1) at a 30m resolution
  3. Resample (2) at a 1 km resolution for export.

Here's some an example code that fails:

var globe = ee.Geometry.Polygon([-180, 88, 0, 88, 180, 88, 180, -88, 0, -88, -180, -88], null, false)

// Need to perform the distance calculation at native (30m) resolution, then reproject the result. 
var I = JRC.gte(80)
I = I.reproject({crs: 'EPSG:4326',
                 scale: 30})
I = I.fastDistanceTransform({neighborhood:1024}).sqrt().multiply(ee.Image.pixelArea().sqrt())
I = I.reproject({crs: 'EPSG:4326',
                 scale: 1000})

//Map.addLayer(I)
Export.image.toDrive({
  image: I,
  region: globe,
  maxPixels: 1e13,
  folder: 'GEexports',
  description: 'nearestwater_1k',
  scale: 1000
});

You can see that I tried to reproject the image and specify a scale to force the analysis at 30m. I get the error message

Error: Number of pixels requested from Image.reproject exceeds the
maximum allowed (2^31).

which is raised by the first reproject (to a scale of 30).

The problem is that if I don't do this reprojection and only specify scale in the Export command, I'm fairly certain GEE will resample the GSW image before thresholding, which is what I'd like to avoid. Nearest distance calculations in this case are highly sensitive to the resolution.

Here's a link to the script.

(I also note that I was able to do the calculation and export at 90 m scale but it produces ~200 files that sum to ~100 GB and I'd like to avoid downloading them, stitching them together into a VRT, then resampling locally if possible.)


I wanted to demonstrate the difference between (1) resampling the GSW image to 1000 meters, thresholding it, and then applying the FDT() and (2) leaving the GSW image in its 30 m (native) resolution, applying the FDT(), and then resampling that output to a 1000 m resolution.

enter image description here

You can see that there is a very large discrepancy between the methods–this is because resampling the GSW dataset to 1000 m basically hides many of the surface water pixels, so the distance transform doesn't "see" many waterbodies of the coarser GSW image. (Top-right resampling was done via gdal on my machine. Bottom-right is GEE output.)

Best Answer

Exporting at a scale of 1000m means that each output tile has an edge size of 1000m * 1024 pixels = 1024km, but you're using a neighborhood of 1000 pixels in the FDT, so that's a total edge length of 3072km of input to compute each output tile. At the original 30m/pixel, that's an input tile of >10 billion pixels, per output tile.

You probably meant to use a neighborhood of 1024m, not 1024km. Just use a smaller neighborhood.