[GIS] How to have GDAL print layers of GeoPDF AND say which are raster vs vector

gdalgdal-translategeopdfgeospatial-pdfogr2ogr

My Objective: I would like to use GDAL to convert a GeoPDF. I want the vector layers as shp files and the raster layers as tif files. I want to do this in a programmatic way.

Edit: In reality, I want to do this with many geospatial PDFs. I'm prototyping the workflow using Python, but it will probably end up being C++. (End Edit)

The Problem: Naturally, the command to convert a vector layer differs from a raster layer. And I don't know (again in a programmatic way) which layers are vector and which are raster.

What I've Tried: First, here is my sample data https://www.terragotech.com/images/pdf/webmap_urbansample.pdf.

gdalinfo webmap_urbansample.pdf -mdd LAYERS

gives the layer names:

...
Metadata (LAYERS):                           
  LAYER_00_NAME=Layers                       
  LAYER_01_NAME=Layers.BPS_-_Water_Sources   
  LAYER_02_NAME=Layers.BPS_-_Facilities      
  LAYER_03_NAME=Layers.BPS_-_Buildings       
  LAYER_04_NAME=Layers.Sewerage_Man_Holes    
  LAYER_05_NAME=Layers.Sewerage_Pump_Stations
  LAYER_06_NAME=Layers.Water_Points          
  LAYER_07_NAME=Layers.Roads                 
  LAYER_08_NAME=Layers.Sewerage_Jump-Ups     
  LAYER_09_NAME=Layers.Sewerage_Lines        
  LAYER_10_NAME=Layers.Water_Lines           
  LAYER_11_NAME=Layers.Cadastral_Boundaries  
  LAYER_12_NAME=Layers.Raster_Images         
...

I know to look at the data which are vector and which are raster, but I don't know how to parse this information to know whether to use ogr2ogr or gdal_translate to do the conversion.

Then I thought I could use ogrinfo and just diff all the layers to deduce which ones are raster, but ogrinfo gives me:

...
1: Cadastral Boundaries (Polygon)
2: Water Lines (Line String)
3: Sewerage Lines (Line String)
4: Sewerage Jump-Ups (Line String)
5: Roads
6: Water Points (Point)
7: Sewerage Pump Stations (Point)
8: Sewerage Man Holes (Point)
9: BPS - Buildings (Polygon)
10: BPS - Facilities (Polygon)
11: BPS - Water Sources (Point)

So there's not a one-to-one correspondence with the way these are output.

Does anyone know how to have gdal print the GeoPDF layers and indicate which are raster vs. vector?

Best Answer

This is not really the answer, but something I've been using as a workaround.

The script compares the text of the layers between gdalinfo and ogrinfo to infer which ones are raster. This approach isn't definitive though, so I imagine it could be wrong from time to time. Even in this example, LAYER_00_NAME=Layers isn't really a raster layer.

def GetRasterVectorLayers(filename):
    from osgeo import gdal
    from osgeo import ogr
    from difflib import SequenceMatcher

    # get vector layers with ogr
    data_ogr = ogr.Open(filename)
    if data_ogr:
        vector_layers = [ data_ogr.GetLayer(i).GetName() for i in range(data_ogr.GetLayerCount()) ]
    else:
        vector_layers = []

    # get all layers with gdal
    data_gdal = gdal.Open( filename, gdal.GA_ReadOnly )
    layers = data_gdal.GetMetadata_List("LAYERS")
    # peel off label, e.g., LAYER_00_NAME=Layers
    layers = [ layer.split('=')[-1] for layer in layers ]

    # match the text to deduce which layers are vector or raster
    matched_layers = []
    for vector_layer in vector_layers:
        layer_matches = []
        for layer in layers:
            layer_matches.append( [SequenceMatcher(None, vector_layer, layer).ratio(), layer] )
        layer_matches.sort()
        best_match = layer_matches[-1][1] # -1 gets the highest score, 1 gets the gdalinfo layer name
        matched_layers.append( [vector_layer,best_match] ) 

    layers_vector = [ match[1] for match in matched_layers ]
    layers_raster = [ layer for layer in layers if layer not in layers_vector ]
    return [layers_raster, layers_vector]

layers_raster, layers_vector = GetRasterVectorLayers('webmap_urbansample.pdf')

layers_raster
# ['Layers', 'Layers.Raster_Images']
layers_vector
# ['Layers.Cadastral_Boundaries', 'Layers.Water_Lines', 'Layers.Sewerage_Lines', 'Layers.Sewerage_Jump-Ups', 'Layers.Roads', 'Layers.Water_Points', 'Layers.Sewerage_Pump_Stations', 'Layers.Sewerage_Man_Holes', 'Layers.BPS_-_Buildings', 'Layers.BPS_-_Facilities', 'Layers.BPS_-_Water_Sources']
Related Question