[GIS] Iterating an Earth Engine calculation (reducer) over every country in the world: a simple example

big datagoogle-earth-engine

I'm an EE newbie, and would like to create a detailed 'resource' for the following challenge.

The problem: What is the best way to iterate a simple calculation (e.g. a Reducer, or a simple user-defined function) over a very large feature collection? In this context, 'best' means (1) the computation works (does not throw either a 'memory limits exceeded' or a 'computation timed out error'), and (2) works fast and in EE way (<=> does not consume server-side resources unnecessarily).

A list of related-but-not-useful questions found to date is appended.

A reproducible example:
The problem can be demonstrated using pre-loaded EE assets. Let's try to iterate over a vector dataset (an EE 'feature collection') of the world's national boundaries in order to count the area of forest loss recorded by Hansen et al (2017 update) for each country in the world:

var hansen = ee.Image("UMD/hansen/global_forest_change_2017_v1_5");
var countries = ee.FeatureCollection("USDOS/LSIB/2013");
//this is a detailed map of country boundaries for the whole world
var simpleCountries = ee.FeatureCollection("USDOS/LSIB_SIMPLE/2017");
//the same thing, but simplified. Small islands and other complicated vector features do not appear
var fourCountries = countries.filter(ee.Filter.inList('name',['SLOVAKIA','BURKINA FASO','BHUTAN','CONGO (Brazzaville)']));
//this is a test collection of four small countries, with relatively simple geometry, plus Congo (to check vs 'Congo' result [in ee tutorial][5])

var lossImage=hansen.select(['loss']) //all Hansen 'loss' pixels, 2000-2017
//these are 30m pixels showing 'forest loss', which we will try to count
var lossIn2012=hansen.select(['lossyear']).eq(12)
//to make things easy, we will work with just one year's loss data at a time
var areaImage=lossIn2012.multiply(ee.Image.pixelArea())
//create a layer where values are pixel area


//the following call (with just 4 small countries) succeeds in less than a minute
//the area reported for Congo (in 2012) accords with the area in the tutorial. Nice!
var smallReduction=areaImage.reduceRegions({
  collection:fourCountries,
  reducer:'sum'
})
print('smallReduction', smallReduction)

//this call fails ("computation timed out")
var hugeReduction=areaImage.reduceRegions({
  collection:countries,
  reducer:'sum'
})
print('hugeReduction', hugeReduction)

//following [the debugging guide][5], we can use map with 
//reduceRegion to break the reduction into smaller tasks.
//this call also fails ("computation timed out")
var mappedReduction=countries.map(function(feature){
  return feature.set(areaImage.reduceRegion({
    reducer:'sum',
    geometry:feature.geometry(),
    maxPixels:1e15 //default is 1e9, which is exceeded by an early country in the list (Brazil?), throwing an error
  }))
})
print('mappedReduction', mappedReduction)

//following debugging guide [again][5] we can export the result...this doesn't get us more memory, but
//will stop the computation from timing out
//killed it after 22 hours of runtime...
Export.table.toDrive({
  collection:mappedReduction,
  description:'mappedReduction',
  fileFormat:'CSV'
})

What gives? Yes, this is a whole lot of pixels – but we're here (maybe learning JavaScript) on the promise of 'planetary scale analysis'. So how can we run planetary-scale analyses in a way that jives with the resources the wonderful folks are Google are able to make available? I (newbie) think answering the following sub-questions might help:

Can the calculation task (ee.Reducer.sum()) broken up or slowed down
to allow the computation to succeed? [edit] I think this merits it's
own question, so have opened one here. If the script in this question fails because some features are just too big, the problem
can be overcome by using larger scale feature collection (but do see
that question – EE appears to choke on features of a reasonable size
for 'planetary analysis')
If we use the bestEffort or tileScale parameters in the call to
reduceRegion, how can we make the result (and any resulting
trade-off in accuracy) verbose? I've played with both (no
luck), but can't see what's going on … or where they fail.
Can export be told to create a separate output file for each
feature, so we can see the progress of a multi-hour operation
(and see if it fails on a specific feature)?

Related-but-not-useful questions have been asked

here (OP asked about memory limits when mapping over a large feature collection and was directed to seek help in a closed google group)
here (OP asked basically the same question; it has been ignored)
here (OP wanted to use reduce region on a feature collection
where memory / time limits were a risk – neither issue arose, and no
workarounds are provided)
here (OP wanted to filter and sort an image collection for an arbitrary (and large) number of vector geometries, and export the
result. The discussion appears to show how to set up such a script,
but doesn't address memory or computation time limits)
here (OP mapped a reducer over a feature collection, but did not discuss memory or time limits)
here (OP is using the python API and iterating with a for() loop; memory or time limits don't come up).
and here (OP is iterating over years in the same example dataset used in this question; memory and time limits don't come up)

Best Answer

The primary problem you're encountering is that the Large Scale International Boundary table contains several very complex polygons and those complex polygons take a long time to reproject (e.g.: the Canada polygon contains 193k vertices). That's compounded by the fact that the reprojection is repeated for every tile that's processed, so even though the reduceRegion(s) doesn't fail, it can take a (very) long time to complete due to the extra overhead added to every tile.

For extremely long aggregations like this, the mapped reduceRegion is better because each feature that's processed gets cached, so if you end up having to restart, the intermediate values aren't lost.

Related Solutions

Google Earth Engine – Iterating Over Years for Feature Collection

You are filtering year in the correct way. This is how I'd do it:

//Load and filter the Hansen data
var gfc2014 = ee.Image('UMD/hansen/global_forest_change_2015')
              .select(['treecover2000','loss','gain','lossyear']);

// list for filter iteration
var years = ee.List.sequence(1, 14)

// turn your scale into a var in case you want to change it
var scale = gfc2014.projection().nominalScale()

//add country districts as a feature collection
var distr = ee.FeatureCollection('ft:1U7sXFHXtxQ--g7XMeXlvPhNXPBcDtPg8Yzr2pvsg', 'geometry');

//look at tree cover, find the area
var treeCover = gfc2014.select(['treecover2000']);

// most recent version of Hansen's data has the treecover2000 layer
// ranging from 0-100. It needs to be divided by 100 if ones wants
// to calculate the areas in ha and not hundreds of ha. If not, the
// layers areaLoss/areaGain are not comparable to the areaCover. Thus
treeCover = treeCover.divide(100); // Thanks to Bruno

var areaCover = treeCover.multiply(ee.Image.pixelArea())
                .divide(10000).select([0],["areacover"])

// total loss area
var loss = gfc2014.select(['loss']);
var areaLoss = loss.gt(0).multiply(ee.Image.pixelArea()).multiply(treeCover)
               .divide(10000).select([0],["arealoss"]);

// total gain area
var gain = gfc2014.select(['gain'])
var areaGain = gain.gt(0).multiply(ee.Image.pixelArea()).multiply(treeCover)
               .divide(10000).select([0],["areagain"]);

// final image
var total = gfc2014.addBands(areaCover)
            .addBands(areaLoss)
            .addBands(areaGain)

Map.addLayer(total,{},"total")

// Map cover area per feature
var districtSums = areaCover.reduceRegions({
  collection: distr,
  reducer: ee.Reducer.sum(),
  scale: scale,
});


var addVar = function(feature) {

  // function to iterate over the sequence of years
  var addVarYear = function(year, feat) {
    // cast var
    year = ee.Number(year).toInt()
    feat = ee.Feature(feat)

    // actual year to write as property
    var actual_year = ee.Number(2000).add(year)

    // filter year:
    // 1st: get mask
    var filtered = total.select("lossyear").eq(year)
    // 2nd: apply mask
    filtered = total.updateMask(filtered)

    // reduce variables over the feature
    var reduc = filtered.reduceRegion({
      geometry: feature.geometry(),
      reducer: ee.Reducer.sum(),
      scale: scale,
      maxPixels: 1e13
    })

    // get results
    var loss = ee.Number(reduc.get("arealoss"))
    var gain = ee.Number(reduc.get("areagain"))

    // set names
    var nameloss = ee.String("loss_").cat(actual_year)
    var namegain = ee.String("gain_").cat(actual_year)

    // alternative 1: set property only if change greater than 0
    var cond = loss.gt(0).or(gain.gt(0))
    return ee.Algorithms.If(cond, 
                            feat.set(nameloss, loss, namegain, gain),
                            feat)

    // alternative 2: always set property
    // set properties to the feature
    // return feat.set(nameloss, loss, namegain, gain)
  }

  // iterate over the sequence
  var newfeat = ee.Feature(years.iterate(addVarYear, feature))

  // return feature with new properties
  return newfeat
}

// Map over the FeatureCollection
var areas = districtSums.map(addVar);

Map.addLayer(areas, {}, "areas")

In that script you get 3 fields: loss_{year}, gain_{year}, sum But if you want better 4 fields: loss, gain, year, sum; change for:

return ee.Algorithms.If(cond, 
                        feat.set("loss", loss, "gain", gain, "year", actual_year), 
                        feat)

You could also compute percentage and set it to the features.

Edit: Thank to @Bruno_Conte_Leite, who made me reconsider my answer, I have made some updates, the one suggested by Bruno and others.

Scale: I suggest to keep the original scale of Hansen data.
treeCover: most recent version of Hansen's data has the treecover2000 layer ranging from 0-100. It needs to be divided by 100 if ones wants to calculate the areas in ha and not hundreds of ha. (Bruno)
areaLoss and areaGain: Added .multiply(treeCover) otherwise the area would be of the whole pixel and not of the indicated percentage
maxPixels: I added maxPixels: 1e13 in the reduction

Google Earth Engine Landsat – How to Filter by Cloud Cover Over Multiple Polygons

The error is quite clear. Parameter geometry of function reduceRegion must be a Geometry. In the documentation says:

geometry (Geometry, default: null): The region over which to reduce data. Defaults to the footprint of the image's first band.

So, if you want to reduce over fc:

var fc = ee.FeatureCollection('ft:omit');

  var imgcollection = ee.ImageCollection("LANDSAT/LC08/C01/T1_RT_TOA");

  var combine = imgcollection.map(function(img) { return img.clip(fc); });

  var withCloudiness = combine.map(function(image) {
    var cloud =       ee.Algorithms.Landsat.simpleCloudScore(image).select('cloud');
    var cloudiness = cloud.reduceRegion({
      reducer: 'mean', 
      geometry: fc.geometry(), 
      scale: 30,
      maxPixels: 1e13,
    });
    return image.set(cloudiness);
  });

  var filteredCollection = withCloudiness.filter(ee.Filter.lt('cloud', 10));
  print(filteredCollection);

Best Answer

Related Solutions

Google Earth Engine – Iterating Over Years for Feature Collection

Google Earth Engine Landsat – How to Filter by Cloud Cover Over Multiple Polygons

Related Question