Xarray S3 Bucket Error – Error Trying to Open NetCDF File with Xarray from S3 Bucket

amazon-web-servicesnetcdfpythonxarray

I'm trying to open a .nc file from an S3 bucket using xarray, but I'm getting an error. Here's the method I'm using:

import xarray as xr

aws_url = 's3://nasanex/NEX-GDDP/BCSD/rcp85/day/atmos/tasmax/r1i1p1/v1.0/tasmax_day_BCSD_rcp85_r1i1p1_inmcm4_2100.nc'
ds = xr.open_dataset(aws_url, engine='netcdf4')

the error being thrown is

OSError: [Errno -128] NetCDF: Attempt to use feature that was not turned on when netCDF was built.: b's3://nasanex/NEX-GDDP/BCSD/rcp85/day/atmos/tasmax/r1i1p1/v1.0/tasmax_day_BCSD_rcp85_r1i1p1_inmcm4_2100.nc'

I have netCDF 4.8.1 on my mac. Additionally the xarray docs say that data should be accessible through an s3 url of this type.

Best Answer

The problem is that the your netcdf engine cannot read this file as it is invalid for this engine.

To solve that you need to install h5netcdf engine using pip or conda.

Also, files from s3 cannot be directly read unless it is zarr file. You should use s3fs library to open the file and read it with xarray with h5netcdf as your engine.

!pip install s3fs h5netcdf --quiet

import s3fs 
import xarray as xr

fs = s3fs.S3FileSystem(anon=True)
aws_url = 's3://nasanex/NEX-GDDP/BCSD/rcp85/day/atmos/tasmax/r1i1p1/v1.0/tasmax_day_BCSD_rcp85_r1i1p1_inmcm4_2100.nc'

with fs.open(aws_url) as fileObj:
  ds = xr.open_dataset(fileObj, engine='h5netcdf')

Note: Restart the runtime after installing those packages.

Working copy