[GIS] Read files with Python GDAL using VSIGS

gdalgoogle-cloudpython

This question is somewhat similar to this one: How to efficiently access files with GDAL from an S3 bucket using VSIS3?, except I am trying to access bucket files from Google Cloud Storage, with Python. I am using GDAL 2.3.1, so I should be able to use the VSIGS virtual driver.

According the the example I came across, it looks like this simple piece of code should work:

from osgeo import gdal

ds = gdal.Open('/vsigs/my_bucket/image.tif') # doesn't work
ds = gdal.Open('gs://my_bucket/image.tif') # doesn't work either

But I keep getting a "file not found" error, so it looks like GDAL does not understand that I'm trying to open a GCS file. What am I missing?

Notes:

  • gcloud is properly installed and configured on my computer (the
    command gsutil ls gs://earthengine-public/ works properly).
  • I'm working inside a Python virtual environment, so that might be the
    issue there.

Best Answer

Rasterio added support for gcs urls in version 1.0.15. If you're working with a recent version you can now use the following:

import rasterio
import os

#can also set these as a normal env var outside of python
os.environ['GS_SECRET_ACCESS_KEY'] = ''
os.environ['GS_ACCESS_KEY_ID'] = ''

#url should look like gs://...
with rasterio.open(url) as src:
    print(src.width, src.height)

This is convenient, as you no longer need to wrap rasterio requests in the env.

You can also authenticate through other gcs methods (e.g. set GOOGLE_APPLICATION_CREDENTIALS variable to service account credentials json), however, these methods currently don't work unless you've installed rasterio from a source distribution (pip install rasterio --no-binary rasterio) due to gdal version incompatibilities.