Python – Open Sentinel 2 Imagery from AWS Servers Using Rasterio Module

gdalpythonrasterrasteriosentinel-2

I am trying to access this file from S3 using Python's rasterio:
"s3://sentinel-cogs/sentinel-s2-l2a-cogs/36/N/YF/2023/6/S2B_36NYF_20230605_0_L2A/"
I will be using these two bands : B08.tif(NIR band) and B04.tif(Red Band).
This is the Python code I have tried so far:

import rasterio
import geopandas as gpd
from shapely.geometry import mapping
from rasterio.mask import mask
from rasterio.transform import from_origin

# Load Sentinel-2 bands
nir_path = 's3://sentinel-cogs/sentinel-s2-l2a-cogs/36/N/YF/2023/6/S2B_36NYF_20230605_0_L2A/B08.tif'
red_path = 's3://sentinel-cogs/sentinel-s2-l2a-cogs/36/N/YF/2023/6/S2B_36NYF_20230605_0_L2A/B04.tif'

with rasterio.open(nir_path) as nir_src, rasterio.open(red_path) as red_src:
    nir = nir_src.read(1).astype('float32')
    red = red_src.read(1).astype('float32')

# Calculate NDVI
ndvi = (nir - red) / (nir + red)

# Load and transform the polygon
polygon_path = 'path_to_sample_polygon.geojson'
polygon_gdf = gpd.read_file(polygon_path)
polygon_gdf = polygon_gdf.to_crs("EPSG:32636")

# Convert polygon to a GeoJSON-like format
polygon_geom = polygon_gdf.geometry.iloc[0]
geometry = [mapping(polygon_geom)]

# Mask NDVI with the polygon
masked_ndvi, out_transform = mask(dataset=ndvi, shapes=geometry, crop=True, transform=nir_src.transform)

# Saving the masked NDVI
masked_ndvi_path = 'masked_ndvi.tif'
meta = nir_src.meta
meta.update({"driver": "GTiff", "height": masked_ndvi.shape[1], "width": masked_ndvi.shape[2], "transform": out_transform})
with rasterio.open(masked_ndvi_path, 'w', **meta) as dst:
    dst.write(masked_ndvi)

print("NDVI calculation and masking complete.")

Its taking lot of time to read the files. What can I do access a file which is on AWS server so as to do my task. Also the mean-nvdi value is coming above 2500+ which is too much

My task is to do the following:

  1. Open Sentinel 2 imagery from AWS servers using python rasterio
    module.
  2. Read Polygon in python using geopandas/shapely module
    whichever you find suitable. (I have the polygon.geoson file already with me.)
  3. Read only subset of the imagery array
    which aligns to the polygon region.
  4. Remember Projection system of
    both polygon and imagery are different. To perform any action
    between the 2 objects(polygon and imagery) they need to be in the
    same projection system.
  5. After subset of the Imagery has been
    extracted perform the NDVI calculation.

Best Answer

You are reading and thus downloading the entire image for both of those bands (B08.tif and B04.tif).

It is much quicker to only read the subset of those images that overlap with the polygon instead of clipping it out later on after downloading the lot. This is one of the primary use cases for the Cloud-Optimized Geotiff (COG).

The below example uses rasterio windowed reading and took 6.5 seconds total for all 3 polygons in my test dataset.

import rasterio
import geopandas as gpd
from rasterio.windows import transform
from rasterio.features import geometry_window, geometry_mask

# Sentinel-2 bands
nir_path = 's3://sentinel-cogs/sentinel-s2-l2a-cogs/36/N/YF/2023/6/S2B_36NYF_20230605_0_L2A/B08.tif'
red_path = 's3://sentinel-cogs/sentinel-s2-l2a-cogs/36/N/YF/2023/6/S2B_36NYF_20230605_0_L2A/B04.tif'

# Polygon path (the file contains 3 polygons with a unique ID field)
polygon_path = 'path_to_sample_polygon.geojson'
polygon_id_field = "ID"

# Load and transform the polygon
polygon_gdf = gpd.read_file(polygon_path)
polygon_gdf = polygon_gdf.to_crs("EPSG:32636")

# Open the AWS COG datasets
with rasterio.Env(AWS_NO_SIGN_REQUEST="YES"):
    with rasterio.open(nir_path) as nir_src, rasterio.open(red_path) as red_src:
        
        # loop through each feature in the GDF
        for feature in polygon_gdf.iterfeatures(show_bbox=True):

            # Get a window
            window = geometry_window(nir_src, [feature["geometry"]])
            window_transform = transform(window, nir_src.transform)
            window_shape = (window.height, window.width)

            # Read all the data in the window, masking out any NoData
            nir = nir_src.read(window=window, masked=True).astype('float32')
            red = red_src.read(window=window, masked=True).astype('float32')

            # Update the NoData mask to exclude anything outside the polygon
            mask = geometry_mask([feature["geometry"]], window_shape, window_transform)
            nir.mask += mask
            red.mask += mask

            # Calculate NDVI
            ndvi = (nir - red) / (nir + red)

            # Save the masked NDVI to one tif per polygon
            ndvi_path = f'ndvi_{feature["properties"][polygon_id_field]}.tif'

            meta = nir_src.meta
            # Don't forget to update the dtype! Your original code didn't, so the Int16 output truncated NDVI values to all 0's
            meta.update({"driver": "GTiff", "dtype": ndvi.dtype, "height": window.height, "width":window.width, "transform": window_transform})
            with rasterio.open(ndvi_path, 'w', **meta) as dst:
                dst.write(ndvi)

enter image description here

enter image description here