I'm trying to download Sentinel-2 data from the Google Cloud Storage and basically adapted the FeLS-module (great work, by the way!). As it is done in the module, I also download the index.csv first (latest version here, but be careful… it's huge!) to search for scenes that match my requirements. But after several successful tests, I figured that there are scenes available within the bucket that are not listed in the csv.
I don't know why and have already contacted Google about this, so now I'm looking for another solution: is there a way to establish a connection to the tile I'm looking for and then list all available subfolders? For example, I would like to do something like this:
import urllib2
from bs4 import BeautifulSoup
url = 'https://console.cloud.google.com/storage/browser/gcp-public-data-sentinel-2/tiles/39/R/YH/'
page = urllib2.urlopen(url)
soup = BeautifulSoup('html.parser')
# the following has been working for another link (https://landsatonaws.com/L8/001/003/), ...
# ... this is just to make clear what I want to do:
table = soup('table')[0].find_all('td')
scenes = [table[i].string for i in range(0, len(table), 3)]
For the given Google Cloud Storage url, I never get something back except for an HTTPError: HTTP Error 404: Not Found
.
If I could get rid of searching the index.csv, I would truly get all scenes that are available. Is this possible somehow?
Best Answer
I was able to achieve it using the module
google-cloud-bigquery
. You need a Google Cloud BigQuery key-file for this, which you can create by following these instructions. You also need a project from which you have to know the project-ID, then you can do something like this:Afterwards, you can download the
manifest.safe
to get the typical SAFE-structure and download all necessary files, for example like this: