Google Earth Engine – How to Allocate User Memory in GEE?

google-earth-enginegoogle-earth-engine-python-api

Can someone explain how user memory is allocated?

I am trying to perform a random forest classification. I have approximately 150 training points and initially an image with around 50 bands. I used the following code to make my training sample.

## make GEE polygons from training areas (gj is a GeoJSON of polygons)
polygons = [ee.Feature(ee.Geometry.Polygon(x['geometry']['coordinates']), x['properties']) for x in gj['features']]

## add polygons to a feature collection
fc = ee.FeatureCollection(polygons)

### Get the values for all pixels in each polygon in the training.
classes = ee.Image().byte().paint(polygons, "class").rename("class")
    
sample = classes.addBands(img.select(train_bands)).sampleRegions(
      collection=fc,
      properties=['class'],
      projection='EPSG:4326',
      scale=30
    )
    
sample.getInfo()

I was getting the User memory limit exceeded error, so I iteratively reduced the number of bands until I successfully sampled the image. The number of bands 'allowed' was 28. (Which, with 150 points seems very low, given that there is mention of capability of ~1 million training points).

Since successfully sampling the image and re-running my processing, I am now receiving the User memory limit exceeded error once more, even after further reducing the number of bands in the training image. It seems 24 bands is now the limit.

Is there a 'cooldown' on when memory is allocated, or is it allocated per user per session, or per transaction?

I do some fairly lengthy processing to create my training features but it has been successful before and now suddenly not.

I cannot very well increase the sampling scale because many of my polygons are 1 pixel wide (indicating water in a river, for example). If I increase the scale, these samples are lost altogether.

Full (very extensive) code here as a Colab notebook.

the first cell is the GeoJSONs for the region and the training areas, the second cell is a number of functions used for data handling and feature engineering, the third cell is my main script.

Best Answer

Your code is incomplete so it's not possible to check on things, however

  1. You're manually creating a collection from a bunch of what looks like geoJSON. If that collection is large, then the manual creation means that that large collection is getting passed around in its entirety instead of as a reference. You can try uploading it as a table instead.

  2. Those GeoJSON features appear to be polygons not points. sampleRegions takes every pixel inside every polygon. So if your "150 training points" is really 150 polygons of any size, you could be ending up with a collection of 500,000 or more points.

You should:

a) include a complete working demo that reproduces the problem, including sharing any assets you're using and,

b) check the size of the collection generated by sampleRegions. It is almost certainly much larger than you think it is.

Related Question