Get lan&lon and values from GeoTIFF using Python

coordinatesgeotiff-tiffland-coverpython

I have GeoTIFF images from https://esa-worldcover.org/en

My goal is to get a list of latitude / longitude / value from that image. The value is a numeric attribute which represents a class as explained in https://worldcover2020.esa.int/data/docs/WorldCover_PUM_V1.1.pdf

I could get the value from each pixel by doing:

import rasterio
img=r'ESA_WorldCover_10m_2020_v100_N39E000_Map.tif'
ds = rasterio.open(img)
data=ds.read()

However, when I'm trying to link those values from lat and lon (after transforming x and y values to lat and lon), the shape doesn't fit.

The code I've used for transforming x and y into coordinates is:

import rasterio
import rasterio.features
import rasterio.warp

with rasterio.open('ESA_WorldCover_10m_2020_v100_N39E000_Map.tif') as dataset:

    # Read the dataset's valid data mask as a ndarray.
    mask = dataset.dataset_mask()
    
    # Extract feature shapes and values from the array.
    for geom, val in rasterio.features.shapes(
            mask, transform=dataset.transform):
    
        # Transform shapes from the dataset's own coordinate
        # reference system to CRS84 (EPSG:4326).
        geom = rasterio.warp.transform_geom(
            dataset.crs, 'EPSG:4326', geom, precision=8) 
    

    

I only get 13213 values. How is it possible? The map can be downloaded in here

Best Answer

This should do the job...

(There's no need to obtain all coordinates of the raster first (which would be more than 1 billion in this case...)

import rasterio
dat = rasterio.open(r"ESA_WorldCover_10m_2020_v100_N45W120_Map.tif")
# read all the data from the first band
z = dat.read()[0]

# check the crs of the data
dat.crs
# >>> CRS.from_epsg(4326)

# check the bounding-box of the data
dat.bounds
# >>> Out[49]: BoundingBox(left=-120.0, bottom=45.0, right=-117.0, top=48.0)

# since the raster is in regular lon/lat grid (4326) we can use 
# `dat.index()` to identify the index of a given lon/lat pair
# (e.g. it expects coordinates in the native crs of the data)

def getval(lon, lat):
    idx = dat.index(lon, lat, precision=1E-6)    
    return dat.xy(*idx), z[idx]

getval(-118, 46)
# >>> ((-117.99995833333334, 46.00004166666667), 10)

... and so we found that (-118, 46) seems to be covered by Trees :-)

enter image description here