I've produced a very large multiband image in EE with the goal of classifying it using the classifiers implemented in sklearn
(the native ones implemented in EE don't provide enough flexibility for my purposes). sklearn
uses 2-D arrays, so minimally I would need to convert each band to a 2D array and feed them in separately as explanatory variables. That's all fine.
Here's my problem: With a raster covering >150k km2, it is beyond tedious and cumbersome to Export.image.toDrive
for each band, only to then re-import them to a python environment using rasterio
. Ideally there would be some way to convert EE image objects to sklearn
-readable NumPy arrays directly using the EE Python API (Google seems to tease as much with their documentation touting the advantages of using EE in Colab: "Seamless integration with Python data science libraries").
Is there a straightforward way to do this that I'm missing?
Best Answer
ee.Image.sampleRectangle()
does this.However, there is a limit of 262144 pixels that can be transferred. The interactive data transfer limit is in place to protect your system from hanging (it is easy to request terabytes of data without realizing it).
So in the case of a large area, your options are to export images to Google Drive or Google Cloud Storage and then import to Earth Engine Python API. Using Google Colab makes this easy - EE is installed by default and there is integration with GDrive and GCS. The Earth Engine batch task export methods are better equipped for dealing with large data (breaks up large exports into manageable sized GeoTIFFs).
Even though
ee.Image.sampleRectangle()
may not be useful for your application, here is a demo in case it helps others.The following Python script transfers three Landsat 8 bands for a rectangular region to the Python client and converts the EE arrays to numpy arrays and then stacks the arrays and displays the 3-D array as an RGB image representation of the region.
IPython Notebook