I have a number of 24k USGS topo maps downloaded from the National Map. I want to convert these GeoPDFs to GeoTIFFs and clip them to remove the collars. When I execute the conversions using the ArcGIS Pro PDF to TIFF geoprocessing tool I lose green. When using osgeo.translate with Python white turns to black.
Here is a screen shot of the original GeoPDF.
If I execute the conversion using the ArcGIS Pro v3 geoprocessing tool PDF to TIFF the resulting TIFF has no green color as seen below.
After discovering that the resulting ArcGIS Pro export was a 4 band TIFF I thought maybe the 4th band was causing some issues. In response I tried osgeo.translate with this Python code to do the conversion and remove the alpha band.
in_pdf = r"I:\USGS_Quads\7_5minUSGSTopoLocal\WA_Acme_20230818_TM_geo.pdf"
out_tiff = r"I:\USGS_Quads\7_5minUSGSTopoLocal\WA_Acme_20230818_TM_geo.tif"
ds = gdal.Open(in_pdf)
ds = gdal.Translate(out_tiff, ds, bandList =[1, 2, 3])
The resulting geoTiff from GDAL returns a TIFF where the white color is turned to black and some artifacts (circled in red) are added to the resulting 3 band TIFF.
Here is a link to a sample geoPDF.
I am unsure why ArcGIS Pro and GDAL would export two different GeoTIFFs but my question is: How do I convert a GeoPDF to a GeoTIFF and retain the same information?
Best Answer
If you open the GeoPDF with some capable PDF viewer like Adobe Acrobat Viewer you will see that it is not just a printed map but almost an application. There are lots of layers which can be turned on or of. The PDF contains even aerial image as one raster layer.
You can see the full list of map layers with gdalinfo
Converting a combination of raster and vector layers into GeoTIFF with GDAL would probably need quite a lot of experimenting and reading of the driver manual pages. Mainly the raster driver https://gdal.org/drivers/raster/pdf.html but maybe also the vector driver https://gdal.org/drivers/vector/pdf.html. ArcGIS does not seem to give many alternatives, it is printing the layers which are selected to be visible by default in the PDF, and makes some error with the Woodland layer. GDAL gives you more options. I would start with something like