[GIS] GeoTIFF (ESA Copernicus data) georeferencing coordinates are mirrored and projection slightly off when checking on Google Maps

coordinate systemcopernicusgdalgdal-translatenetcdf

I worked on downloading a Sentinel 5P data in my question at Converting a netCDF4 file to (georeferenced) GeoTIFF, problems with georeferencing and also transforming the netCDF4 (.nc) file into a geoTIFF image/raster data set.

As noted in the answer to my question, it appears the georeferencing method I use doesn't work correctly.

  1. Two of the corner points are flipped and for a reason or another I don't seem to get them correctly.
  2. I use Google Maps to check I have transformed the geoTIFF file correctly. In addition. The corner points are also slightly off of what Copernicus map shows. I assume this is a difference in the way they project the maps, but as the ultimate goal is to get the tiff on a common web map, I think I need to get this part right too.

Question: Can someone shed light on what's wrong with my projection parameters? The parameters are given in the following, first a bit of setup to make it easier to undersand what I was thinking.

I replicate the previous question here as appropriate for this one. In case it helps, the Copernicus portal is at https://s5phub.copernicus.eu/dhus/#/home. The guest username and password are given in a prompt when one presses login. It has a lot more tooling to fiddle with ideas, shapefiles amongst other things.

Let's assume a netCDF4 (.nc) file downloaded from https://s5phub.copernicus.eu/dhus/odata/v1/Products('e6e91b26-ca43-43d4-9c08-c9c14dd6737e')/$value . This is sulphur dioxide measurement dataset over Vietnam, China, Taiwan and Philippines, given in the linked image. Regarding the portal, searching with S5P_NRTI_L2__SO2____20181030T054006_20181030T054506_05418_01_010102_20181030T062719 yields the same data/file as downloading directly.

Sulphur dioxide data over Vietnam, China and Filippines

From the downloaded .nc file I can see in the metadata

geolocation_grid_from_band=3
geospatial_lat_max=25.532017
geospatial_lat_min=2.1418159
geospatial_lon_max=99.929924
geospatial_lon_min=128.72141

so I proceed the create a geoTIFF raster data file as follows: gdal_translate -ot float32 -unscale -CO COMPRESS=deflate -of GTiff -a_srs EPSG:4326 -a_ullr 25.532017 99.929924 2.1418159 128.72141 HDF5:"S5P_NRTI_L2__SO2____20181030T054006_20181030T054506_05418_01_010102_20181030T062719.nc"://PRODUCT/sulfurdioxide_total_vertical_column so2.tif

Applying these produces a geotiff file with the following georeferecing information

Corner Coordinates:
Upper Left  (      25.532,      99.930) ( 25d31'55.26"E, 99d55'47.73"N)
Lower Left  (      25.532,     128.721) ( 25d31'55.26"E,128d43'17.08"N)
Upper Right (       2.142,      99.930) (  2d 8'30.54"E, 99d55'47.73"N)
Lower Right (       2.142,     128.721) (  2d 8'30.54"E,128d43'17.08"N)
Center      (      13.837,     114.326) ( 13d50'12.90"E,114d19'32.40"N)

Where the Upper Left and Lower Right are approximately correct but otherwise Lower Left and Upper Right have switched places. Also comparing the ESA satellite image (linked) and then inserting the corner coordinates to Google Maps yield slightly different places on the map. I tried to project with EPSG:3857 already and I tried placing the corder coordinates in different order, no dice (as an aside, it'd be nice to know if this can be done without dumping metadata about the corner points first and applying the information specifically).

Here are the Google Maps links for the corner points for convenience:

<edit: With the proposed parameters -a_ullr 99.930 25.532017 128.72141 2.142 gdalinfo so2.tif gives

Corner Coordinates:
Upper Left  (  99.9300000,  25.5320170)
Lower Left  (  99.9300000,   2.1420000)
Upper Right ( 128.7214100,  25.5320170)
Lower Right ( 128.7214100,   2.1420000)
Center      ( 114.3257050,  13.8370085)

which doesn't seem be right. Looking at the coordinates, it gives a feeling the corners are on straight lines, like a square on a flat map. I'll be spelunking something with "spherical Mercator" soon, I suppose. Basically waving hands to every direction and seeing if something sticks. 😛

<edit 5: Maybe it's something about snapping values to pixel colors, i.e. a drawing artifact. Here's another way of seeing the image after removing sentinel values:

SO2 image, without the black band

<edit 4:

On QGIS looking at metadata about sentinel values and setting their opacity to 100% crops of the edges and reveals a black image that looks like the original. Then adjusting the histogram so that the maximum value is the highest measurement point gives the grey scale image in the picture. The black band on the left side seems to be all minimum values. I don't know the cause of this. It could be a projection artefact but it could also be invalid QA values that should be cropped off. Here's the image:

sentinel value opacity set to 100% and histogram adjusted

<edit 3:

I proceed with the instructions hinted by @AndreJ and run the following commands

gdal_translate -of VRT HDF5:"S5P_NRTI_L2__SO2____20181030T054006_20181030T054506_05418_01_010102_20181030T062719.nc"://PRODUCT/longitude lon.vrt
gdal_translate -of VRT HDF5:"S5P_NRTI_L2__SO2____20181030T054006_20181030T054506_05418_01_010102_20181030T062719.nc"://PRODUCT/latitude lat.vrt
gdal_translate -of VRT HDF5:"S5P_NRTI_L2__SO2____20181030T054006_20181030T054506_05418_01_010102_20181030T062719.nc"://PRODUCT/sulfurdioxide_total_vertical_column so2.vrt

And modify the resulting so2.vrt to read as follows

<VRTDataset rasterXSize="450" rasterYSize="278">
   <Metadata>
   <!-- Snipped off for brevity. -->
   </Metadata>
   <!-- Added this geolocation domain information. -->
   <Metadata domain="GEOLOCATION">
       <mdi key="X_DATASET">lon.vrt</mdi>  
       <mdi key="X_BAND">1</mdi>  
       <mdi key="Y_DATASET">lat.vrt</mdi>  
       <mdi key="Y_BAND">1</mdi>  
       <mdi key="PIXEL_OFFSET">0</mdi>  
       <mdi key="LINE_OFFSET">0</mdi>  
       <mdi key="PIXEL_STEP">1</mdi>  
       <mdi key="LINE_STEP">1</mdi>
   </Metadata>    
   <VRTRasterBand dataType="Float32" band="1">
   <Metadata>
       <MDI key="PRODUCT_sulfurdioxide_total_vertical_column_coordinates">/PRODUCT/longitude /PRODUCT/latitude</MDI>
       <MDI key="PRODUCT_sulfurdioxide_total_vertical_column_long_name">total vertical column of sulfur dioxide for the polluted scenario derived from the total slant column</MDI>
       <MDI key="PRODUCT_sulfurdioxide_total_vertical_column_multiplication_factor_to_convert_to_DU">2241.1499 </MDI>
       <MDI key="PRODUCT_sulfurdioxide_total_vertical_column_multiplication_factor_to_convert_to_molecules_percm2">6.02214e+19 </MDI>
       <MDI key="PRODUCT_sulfurdioxide_total_vertical_column_standard_name">atmosphere_mole_content_of_sulfur_dioxide</MDI>
       <MDI key="PRODUCT_sulfurdioxide_total_vertical_column_units">mol m-2</MDI>
       <MDI key="PRODUCT_sulfurdioxide_total_vertical_column__FillValue">9.96921e+36 </MDI>
       <MDI key="PRODUCT_sulfurdioxide_total_vertical_column__Netcdf4Dimid">2 </MDI>
     </Metadata>    
     <SimpleSource> 
         <SourceFilename relativeToVRT="1">HDF5:S5P_NRTI_L2__SO2____20181030T054006_20181030T054506_05418_01_010102_20181030T062719.nc://PRODUCT/sulfurdioxide_total_vertical_column</SourceFilename>
         <SourceBand>1</SourceBand>
         <SourceProperties RasterXSize="450" RasterYSize="278" DataType="Float32" BlockXSize="450" BlockYSize="278" />
         <SrcRect xOff="0" yOff="0" xSize="450" ySize="278" />
         <DstRect xOff="0" yOff="0" xSize="450" ySize="278" />
      </SimpleSource>
 </VRTRasterBand>
 </VRTDataset>

Then running gdalwarp -geoloc -t_srs EPSG:4326 -overwrite so2.vrt so2.tif and finally gdalinfo so2.tif yields

Corner Coordinates:
Upper Left  ( 100.1267700,  25.0621347) (100d 7'36.37"E, 25d 3'43.69"N)
Lower Left  ( 100.1267700,   2.5234540) (100d 7'36.37"E,  2d31'24.43"N)
Upper Right ( 128.5601826,  25.0621347) (128d33'36.66"E, 25d 3'43.69"N)
Lower Right ( 128.5601826,   2.5234540) (128d33'36.66"E,  2d31'24.43"N)
Center      ( 114.3434763,  13.7927944) (114d20'36.51"E, 13d47'34.06"N)
Band 1 Block=493x4 Type=Float32, ColorInterp=Gray

which isn't correct. Interestingly the coordinates are "inverted" in that changing 100.1267700, 25.0621347 -> 25.0621347, 100.1267700 is approximately correct, albeit slightly offset when compared to the Copernicus image (looking the coordinates via Google Maps now). It doesn't appear the reason is solely about longitude and latitude information changed but also the projection is wrong.

Opened in QGIS and projected on Google Maps gives SO2 geotiff file which looks like being in the right place. The data is unscaled so nearly black (I assume), the white strand is an interesting looking thing (looks like being the edge of the original satellite picture, i.e. no data, but why black on the other side then?).

<edit 2: There's interesting discussion at https://github.com/stcorp/harp/issues/189 about possible issues. Even if the discussed issue isn't directly this, one should note it's about Sentinel 5P, georeferencing and invalid values in certain situations. For that I wonder if that's the case here and how to find those values and filter out. As a tangential educative issue, pay attention to vocabulary about level 1/2/3 products too, see more at https://sentinels.copernicus.eu/web/sentinel/technical-guides/sentinel-5p/products-algorithms.

<edit 1: In the metadata I see (I don't know if this matters):

PRODUCT_corner_CLASS=DIMENSION_SCALE
PRODUCT_corner_comment=This coordinate variable defines the indices for the pixel corners; index starts a 0 (counter-clockwise, starting from south-western corner of the pixel in ascending part of the orbit).
PRODUCT_corner_long_name=pixel corner index
PRODUCT_corner_NAME=corner
PRODUCT_corner_REFERENCE_LIST=
PRODUCT_corner_units=1
PRODUCT_corner__FillValue=-2147483647
PRODUCT_corner__Netcdf4Dimid=3
PRODUCT_delta_time_long_name=offset from reference start time of measurement
PRODUCT_delta_time_units=milliseconds
PRODUCT_delta_time__FillValue=-2147483647
PRODUCT_delta_time__Netcdf4Dimid=2
PRODUCT_ground_pixel_axis=X
PRODUCT_ground_pixel_CLASS=DIMENSION_SCALE
PRODUCT_ground_pixel_comment=This coordinate variable defines the indices across track, from west to east; index starts at 0
PRODUCT_ground_pixel_long_name=across-track dimension index
PRODUCT_ground_pixel_NAME=ground_pixel
PRODUCT_ground_pixel_REFERENCE_LIST=
PRODUCT_ground_pixel_units=1
PRODUCT_ground_pixel__FillValue=-2147483647
PRODUCT_ground_pixel__Netcdf4Dimid=1
PRODUCT_latitude_bounds=/PRODUCT/SUPPORT_DATA/GEOLOCATIONS/latitude_bounds
PRODUCT_latitude_long_name=pixel center latitude
PRODUCT_latitude_standard_name=latitude
PRODUCT_latitude_units=degrees_north
PRODUCT_latitude_valid_max=90
PRODUCT_latitude_valid_min=-90
PRODUCT_latitude__FillValue=9.96921e+36
PRODUCT_latitude__Netcdf4Dimid=2
PRODUCT_layer_CLASS=DIMENSION_SCALE
PRODUCT_layer_long_name=layer dimension index
PRODUCT_layer_NAME=layer
PRODUCT_layer_REFERENCE_LIST=
PRODUCT_layer_units=1
PRODUCT_layer__Netcdf4Dimid=4
PRODUCT_longitude_bounds=/PRODUCT/SUPPORT_DATA/GEOLOCATIONS/longitude_bounds
PRODUCT_longitude_long_name=pixel center longitude
PRODUCT_longitude_standard_name=longitude
PRODUCT_longitude_units=degrees_east
PRODUCT_longitude_valid_max=180
PRODUCT_longitude_valid_min=-180
PRODUCT_longitude__FillValue=9.96921e+36
PRODUCT_longitude__Netcdf4Dimid=2

Best Answer

In case this is useful to anyone, I created a Python script that replicates the accepted answer and that can be used on a number of S5p files for a list of variables, without having to manually edit the vrt file. The code is available here, but I also copy-paste below. EDIT: I updated the code to automatically mask invalid pixels based on the qa_value subdataset stored in the NETCDF file.

import os
import subprocess
from osgeo import gdal
import numpy as np


def write_s5p_tif(in_filepath, variables, output_folder, EPSG_code="4326", spatial_res=[]):
    """
    Write specified variables of S5p NETCDF file to a georeferenced GTiff file 
    in Pseudo-Mercator with spatial resolution 3500m x 7000m.

    Parameters
    ----------
    in_filepath: str
        full filepath to S5p NETCDF file (.nc)
    variables: list of str
        list of variables in the S5p NETCDF file that will be written to GTiff files
    EPSG_code: str
        EPSG code of desired projection, default is 3857 (Pseudo-Mercator)
    spatial_res: list of float
        X and Y spatial resolution (sampling) in degrees or meters according to the chosen projection
        Defaults to 3500m x 7000m (or degree equivalent scaled by mean latitude)

    """

    kwargs = {}
    kwargs['EPSG_code'] = EPSG_code
    kwargs['spatial_res'] = spatial_res

    # Create vrt files for latitude and longitude variables
    geo_params = {}
    geo_params['outputSRS'] = f"EPSG:{EPSG_code}"
    gdal.Translate("lat.vrt", f'HDF5:"{in_filepath}"://PRODUCT/latitude', **geo_params)
    lat_ds = gdal.Open("lat.vrt")
    lats = lat_ds.ReadAsArray()
    gdal.Translate("lon.vrt", gdal.Open(f'HDF5:"{in_filepath}"://PRODUCT/longitude'), **geo_params)

    ds = gdal.Open(in_filepath)
    md = ds.GetMetadata()

    data_var = "temp_s5p.tif"
    mask_var = "qa_value.tif"
    output_file = generate_out_filepath(in_filepath, output_folder, ".tif")

    for variable in variables:
        # Georeference variable datset
        write_var_to_tif(data_var, in_filepath, variable, "lon.vrt", "lat.vrt", lat_ds,
                      **kwargs)
        # Georeference quality_value datset
        write_var_to_tif(mask_var, in_filepath, "qa_value", "lon.vrt", "lat.vrt", lat_ds,
                      **kwargs)
        # Apply quality mask to variable dataset
        write_masked_data(output_file, data_var, mask_var, mask_threshold=75)

        # Clean
        os.remove(data_var)
        os.remove(data_var.split('.')[0] + '_.vrt')
        os.remove(mask_var)
        os.remove(mask_var.split('.')[0] + '_.vrt')

    # Remove vrt files
    os.remove('lon.vrt')
    os.remove('lat.vrt')


def write_var_to_tif(out_filepath, in_filepath, variable, lon_file, lat_file, ds, **kwargs):
    """

    """

    vrt_filepath = out_filepath.split('.')[0] + '_.vrt'

    with open(vrt_filepath, "w") as text_file:
        text_file.write(f"""
<VRTDataset rasterXSize="{ds.RasterXSize}" rasterYSize="{ds.RasterYSize}">
    <metadata domain="GEOLOCATION">
        <mdi key="X_DATASET">{lon_file}</mdi>
        <mdi key="X_BAND">1</mdi>
        <mdi key="Y_DATASET">{lat_file}</mdi>
        <mdi key="Y_BAND">1</mdi>
        <mdi key="PIXEL_OFFSET">0</mdi>
        <mdi key="LINE_OFFSET">0</mdi>
        <mdi key="PIXEL_STEP">1</mdi>
        <mdi key="LINE_STEP">1</mdi>
    </metadata> 
    <VRTRasterBand band="1" datatype="Float32">
        <SimpleSource>
            <SourceFilename relativeToVRT="0">HDF5:{in_filepath}://PRODUCT/{variable}</SourceFilename>
            <SourceBand>1</SourceBand>
            <SourceProperties RasterXSize="{ds.RasterXSize}" RasterYSize="{ds.RasterYSize}" DataType="Float32" BlockXSize="{ds.RasterXSize}" BlockYSize="{ds.RasterYSize}" />
            <SrcRect xOff="0" yOff="0" xSize="{ds.RasterXSize}" ySize="{ds.RasterYSize}" />
            <DstRect xOff="0" yOff="0" xSize="{ds.RasterXSize}" ySize="{ds.RasterYSize}" />
        </SimpleSource>
    </VRTRasterBand>
</VRTDataset>
""")

    # Add georeferencing to vrt file
    georef_data(out_filepath, vrt_filepath, vrt=False, **kwargs)


def write_masked_data(out_filepath, data_file, mask_file, mask_threshold=75):
    """

    """

    data_ds = gdal.Open(data_file)
    data = data_ds.ReadAsArray()
    mask = gdal.Open(mask_file).ReadAsArray()
    data[mask <= mask_threshold] = np.nan

    driver = gdal.GetDriverByName('GTiff')
    dataset = driver.Create(
        out_filepath,
        data_ds.RasterXSize,
        data_ds.RasterYSize,
        1,
        gdal.GDT_Float32, )

    dataset.SetGeoTransform(data_ds.GetGeoTransform())  
    dataset.SetProjection(data_ds.GetProjectionRef())
    dataset.GetRasterBand(1).WriteArray(data)
    dataset.FlushCache()  # Write to disk.


def georef_data(out_filepath, in_filepath, vrt, EPSG_code, spatial_res):
    """

    """

    params = {}
    #params['geoloc'] = True
    #params['srcNodata'] = float(md[f"PRODUCT_{variable}__FillValue"])
    params['dstNodata'] = -9999
    params['dstSRS'] = f"EPSG:{EPSG_code}"
    if vrt:
        params["format"] = "VRT"
        #ext = ".vrt"
        out_filepath = out_filepath.replace('.tif', '.vrt')
    else:
        params["format"] = "Gtiff"
        #ext = ".tif"
        out_filepath = out_filepath.replace('.vrt', '.tif')
    if not spatial_res:
        if EPSG_code == "4326":
            params['xRes'] = 0.06288  # equivalent to 7000 meters
            params['yRes'] = 0.06288  # equivalent to 7000 meters
        else:
            params['xRes'] = 7000  # meters
            params['xRes'] = 7000  # meters
    else:
        params['xRes'] = spatial_res[0]
        params['xRes'] = spatial_res[1]

    #out_filepath = generate_out_filepath(in_filepath, output_folder, ext)
    gdal.Warp(out_filepath, in_filepath, **params)



def generate_out_filepath(in_filepath, output_folder, ext):
    """


    ext: '.tif', '.vrt'

    """

    filename = in_filepath.split(os.path.sep)[-1]
    var_name_short = filename[13:20].replace('_','')
    timestamp = filename[20:35]
    return output_folder + os.path.sep + var_name_short + '_' + timestamp + ext


def merge_rasters(in_filenames, output_filename):
    """

    """

    cmd_gdal_merge = " ".join([
                                'gdal_merge.py', '-init 255 -o', output_filename, \
                                '-n 9999' \
                                ] + \
                                ['"%s"' % in_filename for in_filename in in_filenames])
    subprocess.check_output(cmd_gdal_merge, shell=True)

Once saved as a file, one can import the function 'write_s5p_tif' and use it this way:

write_s5p_tif('S5P_OFFL_L2__NO2____20191222T123026_20191222T141156_11352_01_010302_20191224T055628.nc', ['nitrogendioxide_tropospheric_column'], '/path/to/folder')
Related Question