I am currently facing some problem when I tried to import a PDF as a graphic.
The PDF with some original text beside was first cropped to remove the text and then imported using \includegraphics
.
However, as shown in the figure, the original text remained and could be 'seen' when highlighted (when viewing in PDF reader). In fact, by copying and pasting the text to a word file, the original text could be viewed again.
Anyone know what caused this and maybe the solution for it? (besides converting the PDF to image)
Best Answer
What (likely) happened
It is very likely, that your imported PDF was imported with the complete content -- but part of that content was just hidden by applying a different 'CropBox' (smaller than the 'MediaBox' of the original PDF) to the import. This can be achieved by just manipulating a very small part of the PDF code.
How it typically happens
It typically happens when you use macOS' 'Preview' app to "crop" parts of the pages. And users often get fooled: they think they had removed everything which is now invisible and have protected their wider company secrets when sending such a page "extract" to a customer. However, a simple change of two numbers in a text editor will make all page content visible again!
How to better understand what happened
For understanding how this type of "cropping" works, imagine a page full of contents to be covered by a white sheet of paper. Now cut a little window into that sheet that lets you just see the "cropped" area. The rest of the content is not gone, but still there; it's just currently invisible and can be made visible again by simply removing the white sheet of paper with the little window in it. Since the PDF viewers do not really show you the white areas around the visible part your mind is easily fooled into imagining that the remaining content is gone for good. To really crop you'd have to cut away all the unwanted content from the original paper and burn it...
How to solve the problem
In order to make the content outside of such cropped areas really go away (and be permanently removed from the page's
/Contents
stream, you have to re-process the initial cropped page one more time. Running it through Ghostscript like this will achieve this result:How to check if the solution worked
To verify, you should then run these two
pdfinfo
commands:Only that PDF file which has the same values for MediaBox and for CropBox can be guaranteed to not carry extra (hidden) content outside the area being displayed by PDF viewers. These ones you could savely import into your documents. All PDF files which have smaller CropBox-es than MediaBox-es may have extra invisible contents travelling with them....