My Objective: I would like to use GDAL to convert a GeoPDF. I want the vector layers as shp files and the raster layers as tif files. I want to do this in a programmatic way.
Edit: In reality, I want to do this with many geospatial PDFs. I'm prototyping the workflow using Python, but it will probably end up being C++. (End Edit)
The Problem: Naturally, the command to convert a vector layer differs from a raster layer. And I don't know (again in a programmatic way) which layers are vector and which are raster.
What I've Tried: First, here is my sample data https://www.terragotech.com/images/pdf/webmap_urbansample.pdf.
gdalinfo webmap_urbansample.pdf -mdd LAYERS
gives the layer names:
...
Metadata (LAYERS):
LAYER_00_NAME=Layers
LAYER_01_NAME=Layers.BPS_-_Water_Sources
LAYER_02_NAME=Layers.BPS_-_Facilities
LAYER_03_NAME=Layers.BPS_-_Buildings
LAYER_04_NAME=Layers.Sewerage_Man_Holes
LAYER_05_NAME=Layers.Sewerage_Pump_Stations
LAYER_06_NAME=Layers.Water_Points
LAYER_07_NAME=Layers.Roads
LAYER_08_NAME=Layers.Sewerage_Jump-Ups
LAYER_09_NAME=Layers.Sewerage_Lines
LAYER_10_NAME=Layers.Water_Lines
LAYER_11_NAME=Layers.Cadastral_Boundaries
LAYER_12_NAME=Layers.Raster_Images
...
I know to look at the data which are vector and which are raster, but I don't know how to parse this information to know whether to use ogr2ogr or gdal_translate to do the conversion.
Then I thought I could use ogrinfo
and just diff all the layers to deduce which ones are raster, but ogrinfo
gives me:
...
1: Cadastral Boundaries (Polygon)
2: Water Lines (Line String)
3: Sewerage Lines (Line String)
4: Sewerage Jump-Ups (Line String)
5: Roads
6: Water Points (Point)
7: Sewerage Pump Stations (Point)
8: Sewerage Man Holes (Point)
9: BPS - Buildings (Polygon)
10: BPS - Facilities (Polygon)
11: BPS - Water Sources (Point)
So there's not a one-to-one correspondence with the way these are output.
Does anyone know how to have gdal print the GeoPDF layers and indicate which are raster vs. vector?
Best Answer
This is not really the answer, but something I've been using as a workaround.
The script compares the text of the layers between gdalinfo and ogrinfo to infer which ones are raster. This approach isn't definitive though, so I imagine it could be wrong from time to time. Even in this example,
LAYER_00_NAME=Layers
isn't really a raster layer.