[GIS] Read Sentinel-2 zip-file into memory using gdal in python

gdalpythonremote sensingsentinel-2

I was wondering if there's a way to read a Sentinel-2 zipfile directly into memory through python without extracting the whole thing or selected contents.

I figured since GDAL does have a driver for the S2 format, it should be somehow possible to open the zipfile in python and then read the bands into arrays separately.

Another approach I was thinking about is the zipfile module in python. I came as far as accessing the single files in the archive, but I haven't managed yet to read them into memory or pass them over to GDAL.

Edit:

Sorry, I forgot to mention I'm generally looking for a platform-independent solution

Edit2:

Also, I intend to stay away from ESA's snappy


Solution:

Alright, after the helpful tips I got here I finally implemented the following solution. I ended up installing gdal 2.1.3 from conda-forge inside a new environment to bypass any dependency issues. There's still an issue with the UTF encoding using vsizip which can be fixed setting the corresponding environment variable as discribed in this issue on github.

#!/usr/bin/env python

import gdal
from zipfile import ZipFile
import re
import os

## vsizip bugfix
os.environ['CPL_ZIP_ENCODING'] = 'UTF-8'

#choose tile 
tile = 'T36PXV'

tileREXP = re.compile(r'.*(%s).*B(02|03|04|05|06|07|08|8A|11|12).jp2$' %tile)

# sort bands 2 - 12 with 8A after 8
bndIX = [1,2,3,4,5,6,7,9,10,8]

with open('S2_FILELIST.txt') as flist:
    for filename in flist:
        with ZipFile(filename,'r') as zfile:
            bands = [x for x in zfile.namelist() if re.match(tileREXP,x)]
            bands.sort()
            bands =  [x for (y,x) in zip(bndIX,bands)]
            for b in bands:
                ds = gdal.Open('/vsizip/%s/%s' %(filename,b))
                arr = ds.ReadAsArray()

                # DO STUFF

                ds = None

Best Answer

GDAL >= 2.1 is able to open the Sentinel-2 .zip file directly. For instance (from the SENTINEL2 driver documentation page):

$ gdalinfo S2A_OPER_PRD_MSIL1C_PDMC_20150818T101440_R022_V20150813T102406_20150813T102406.zip

Then you could convert subdataset into a VRT file (without extracting any .jp2 file). This is an useful example from the driver documentation:

$ python -c "import sys; from osgeo import gdal; ds = gdal.Open(sys.argv[1]); open(sys.argv[2], 'wb').write(ds.GetMetadata('xml:VRT')[0].encode('utf-8'))" \
SENTINEL2_L1C:S2A_OPER_MTD_SAFL1C_PDMC_20150818T101440_R022_V20150813T102406_20150813T102406.xml:10m:EPSG_32632 10m.vrt

Note: The .xml metadata file can be read directly from the .zip file thanks to the vsizip virtual file system: /vsizip/S2XXXXX.zip/metadata.xml

Related Question