GeoTIFF TIFF – Exporting GeoPDF to GeoTIFF and Resulting TIFF Alteration in Pro and GDAL

convertgdal-translategeopdfgeotiff-tiff

I have a number of 24k USGS topo maps downloaded from the National Map. I want to convert these GeoPDFs to GeoTIFFs and clip them to remove the collars. When I execute the conversions using the ArcGIS Pro PDF to TIFF geoprocessing tool I lose green. When using osgeo.translate with Python white turns to black.

Here is a screen shot of the original GeoPDF.

enter image description here

If I execute the conversion using the ArcGIS Pro v3 geoprocessing tool PDF to TIFF the resulting TIFF has no green color as seen below.

enter image description here

enter image description here

After discovering that the resulting ArcGIS Pro export was a 4 band TIFF I thought maybe the 4th band was causing some issues. In response I tried osgeo.translate with this Python code to do the conversion and remove the alpha band.

in_pdf = r"I:\USGS_Quads\7_5minUSGSTopoLocal\WA_Acme_20230818_TM_geo.pdf"
out_tiff = r"I:\USGS_Quads\7_5minUSGSTopoLocal\WA_Acme_20230818_TM_geo.tif"
ds = gdal.Open(in_pdf)
ds = gdal.Translate(out_tiff, ds, bandList =[1, 2, 3])

The resulting geoTiff from GDAL returns a TIFF where the white color is turned to black and some artifacts (circled in red) are added to the resulting 3 band TIFF.

enter image description here

Here is a link to a sample geoPDF.

I am unsure why ArcGIS Pro and GDAL would export two different GeoTIFFs but my question is: How do I convert a GeoPDF to a GeoTIFF and retain the same information?

Best Answer

If you open the GeoPDF with some capable PDF viewer like Adobe Acrobat Viewer you will see that it is not just a printed map but almost an application. There are lots of layers which can be turned on or of. The PDF contains even aerial image as one raster layer.

enter image description here

You can see the full list of map layers with gdalinfo

gdalinfo  -mdd layers WA_Acme_20230818_TM_geo.pdf
...
Metadata (layers):
  LAYER_00_NAME=Labels
  LAYER_01_NAME=Labels
  LAYER_02_NAME=Map_Collar
  LAYER_03_NAME=Map_Collar.Map_Elements
  LAYER_04_NAME=Map_Frame
  LAYER_05_NAME=Map_Frame.Projection_and_Grids
  LAYER_06_NAME=Map_Frame.Geographic_Names
  LAYER_07_NAME=Map_Frame.Structures
  LAYER_08_NAME=Map_Frame.Transportation
  LAYER_09_NAME=Map_Frame.Transportation.Road_Names_and_Shields
  LAYER_10_NAME=Map_Frame.Transportation.Road_Features
  LAYER_11_NAME=Map_Frame.Transportation.Trails
  LAYER_12_NAME=Map_Frame.Transportation.Railroads
  LAYER_13_NAME=Map_Frame.Transportation.Airports
  LAYER_14_NAME=Map_Frame.PLSS
  LAYER_15_NAME=Map_Frame.Wetlands
  LAYER_16_NAME=Map_Frame.Hydrography
  LAYER_17_NAME=Map_Frame.Terrain
  LAYER_18_NAME=Map_Frame.Terrain.Contours
  LAYER_19_NAME=Map_Frame.Terrain.Shaded_Relief
  LAYER_20_NAME=Map_Frame.Woodland
  LAYER_21_NAME=Map_Frame.Boundaries
  LAYER_22_NAME=Map_Frame.Boundaries.Jurisdictional_Boundaries
  LAYER_23_NAME=Map_Frame.Boundaries.Jurisdictional_Boundaries.State_or_Territory
  LAYER_24_NAME=Map_Frame.Boundaries.Jurisdictional_Boundaries.County_or_Equivalent
  LAYER_25_NAME=Images
  LAYER_26_NAME=Images.Orthoimage
  LAYER_27_NAME=Barcode

Converting a combination of raster and vector layers into GeoTIFF with GDAL would probably need quite a lot of experimenting and reading of the driver manual pages. Mainly the raster driver https://gdal.org/drivers/raster/pdf.html but maybe also the vector driver https://gdal.org/drivers/vector/pdf.html. ArcGIS does not seem to give many alternatives, it is printing the layers which are selected to be visible by default in the PDF, and makes some error with the Woodland layer. GDAL gives you more options. I would start with something like

gdal_translate WA_Acme_20230818_TM_geo.pdf test.tif --config GDAL_PDF_LAYERS_OFF layer_name1,layer_name2
Related Question