[Tex/LaTex] Detecting all pages which contain color

automationcolorpdfscripts

In an larger LaTeX document there are often only some pages with color content (mainly figures) and the remaining ones are only black and white.
Because printing costs for color pages are much higher than for black and white it would be good to be able to extract all pages with color and print them separately. The first step for this is to be able to detect if a page contains color or not. This could be in a form of an text list of page number suitable to be read by a PDF page extraction script (using e.g. pdftk).

A simple solution sufficient for many people would be to detect all pages which contain a figure and assume that only these have color. However, a general solution would be nice. Only color elements which are printed should be taken into account, while e.g. the color frames around link by hyperref should not. It is OK that the solution would disable these for the detection.

Best Answer

Newer versions of Ghostscript (version 9.05 and later) include a "device" called inkcov. It calculates the ink coverage of each page (not for each image) in Cyan (C), Magenta (M), Yellow (Y) and Black (K) values, where 0.00000 means 0%, and 1.00000 means 100%.

Example commandline:

gs -o - -sDEVICE=inkcov /path/to/your.pdf

Example output:

Page 1
0.00000  0.00000  0.00000  0.02230 CMYK OK
Page 2
0.02360  0.02360  0.02360  0.02360 CMYK OK
Page 3
0.02525  0.02525  0.02525  0.00000 CMYK OK
Page 4
0.00000  0.00000  0.00000  0.01982 CMYK OK

You can see here that the pages 1+4 are using no color, while pages 2+3 do. This case is particularly 'nasty' for people who want to save on color ink: because all the respective C, M, Y (and K) values are exactly the same for each of the pages 2+3, they possibly could appear to the human eye not as color pages, but as ("rich") grayscale anyway (if each single pixel is mixed with these color values).

Ghostscript can also convert color into grayscale. Example commandline:

gs                                \
  -o grayscale.pdf                \
  -sDEVICE=pdfwrite               \
  -sColorConversionStrategy=Gray  \
  -sProcessColorModel=/DeviceGray \
   /path/to/your.pdf

Checking for the ink coverage distribution again (note how the addition of -q to the parameters slightly changes the output format):

gs -q  -o - -sDEVICE=inkcov grayscale.pdf
 0.00000  0.00000  0.00000  0.02230 CMYK OK
 0.00000  0.00000  0.00000  0.02360 CMYK OK
 0.00000  0.00000  0.00000  0.02525 CMYK OK
 0.00000  0.00000  0.00000  0.01982 CMYK OK
Related Question