[Tex/LaTex] How to install PDFBox on Windows so it works with pax

paxpdfpageswindows

I am trying to use pdfpages and maintain the internal bookmarks within the included PDF files. The pdfpages README suggests using the pax package for this purpose, so I have installed that from CTAN and refreshed my database (MiKTeX 2.9).

I am running Windows 7 (64-bit) and have installed JRE and JDK (in that order) and Strawberry Perl (to folder C:\StrawberryPerl\).

I downloaded PDFBox version 0.7.3 (which is supposed to be compatible with pax) from http://sourceforge.net/projects/pdfbox/files/ and installed it to C:\PDFBox.

Then I added C:\PDFBox\ and C:\MiKTeX\scripts\pax\ to my system Path variable and rebooted.

Then I installed pdfannotextractor.pl using the command line:

perl C:\MiKTeX\scripts\pax\pdfannotextractor.pl --install

with the following result:

C:\>perl C:\MiKTeX\scripts\pax\pdfannotextractor.pl --install
PDFAnnotExtractor 0.1l, 2012/04/18 - Copyright (c) 2008, 2011, 2012 by Heiko Oberdiek.
* Nothing to do, because PDFBox is already found:
  C:\PDFBox

C:\>

So PDFBox seems to be installed satisfactorily. However, when I try to run the pax script using the following command:

java -jar C:\MiKTeX\scripts\pax\pax.jar FileWithBookmarks.pdf

I get this result:

Exception in thread "main" java.lang.NoClassDefFoundError: org/pdfbox/cos/ICOSVisitor
    at java.lang.Class.getDeclaredMethods0(Native Method)
    at java.lang.Class.privateGetDeclaredMethods(Unknown Source)
    at java.lang.Class.getMethod0(Unknown Source)
    at java.lang.Class.getMethod(Unknown Source)
    at sun.launcher.LauncherHelper.getMainMethod(Unknown Source)
    at sun.launcher.LauncherHelper.checkAndLoadMain(Unknown Source)
Caused by: java.lang.ClassNotFoundException: org.pdfbox.cos.ICOSVisitor
    at java.net.URLClassLoader$1.run(Unknown Source)
    at java.net.URLClassLoader$1.run(Unknown Source)
    at java.security.AccessController.doPrivileged(Native Method)
    at java.net.URLClassLoader.findClass(Unknown Source)
    at java.lang.ClassLoader.loadClass(Unknown Source)
    at sun.misc.Launcher$AppClassLoader.loadClass(Unknown Source)
    at java.lang.ClassLoader.loadClass(Unknown Source)
    ... 6 more

If instead I use the following command:

perl C:\MiKTeX\scripts\pax\pdfannotextractor.pl FileWithBookmarks.pdf

I get the same java exception as above.

Can anyone help?

UPDATE: After adding C:\PDFBox\ to my CLASSPATH, here is my command and the debugging results:

C:\>perl C:\MiKTeX\scripts\pax\pdfannotextractor.pl --debug FileWithBookmarks.pdf
PDFAnnotExtractor 0.1l, 2012/04/18 - Copyright (c) 2008, 2011, 2012 by Heiko Oberdiek.
* CLASSPATH: [.;C:\Program Files (x86)\Java\jre6\lib\ext\QTJava.zip;C:\PDFBox\]
* is_win: [1]
* Which kpsewhich: [C:\MiKTeX\miktex\bin\kpsewhich.EXE]
* Backticks: [kpsewhich --progname pdfannotextractor --format texmfscripts pax.jar]
* Exit code: [0/success]
* pax.jar: [C:/MiKTeX/scripts/pax/pax.jar]
* PDFBox in CLASSPATH: [yes]
* Which java: [C:\Windows\system32\java.EXE]
* System: [java -cp C:/MiKTeX/scripts/pax/pax.jar;C:\PDFBox;.;C:\Program Files (x86)\Java\jre6\lib\ext\QTJava.zip;C:\PDFBox\ pax.PDFAnnotExtractor FileWithBookmarks.pdf]
Usage: java [-options] class [args...]
       (to execute a class)
or  java [-options] -jar jarfile [args...]
       (to execute a jar file)
where options include:
-d32          use a 32-bit data model if available
-d64          use a 64-bit data model if available
-server       to select the "server" VM
-hotspot      is a synonym for the "server" VM  [deprecated]
              The default VM is server.

-cp 
-classpath 
              A ; separated list of directories, JAR archives,
              and ZIP archives to search for class files.
-D=
              set a system property
-verbose[:class|gc|jni]
              enable verbose output
-version      print product version and exit
-version:
              require the specified version to run
-showversion  print product version and continue
-jre-restrict-search | -no-jre-restrict-search
              include/exclude user private JREs in the version search
-? -help      print this help message
-X            print help on non-standard options
-ea[:...|:]
-enableassertions[:...|:]
              enable assertions with specified granularity
-da[:...|:]
-disableassertions[:...|:]
              disable assertions with specified granularity
-esa | -enablesystemassertions
              enable system assertions
-dsa | -disablesystemassertions
              disable system assertions
-agentlib:[=]
              load native agent library , e.g. -agentlib:hprof
              see also, -agentlib:jdwp=help and -agentlib:hprof=help
-agentpath:[=]
              load native agent library by full pathname
-javaagent:[=]
              load Java programming language agent, see java.lang.instrument
-splash:
              show splash screen with specified image
See http://www.oracle.com/technetwork/java/javase/documentation/index.html for more details.
* Exit code: [1]

C:\>

Best Answer

In addition to Heiko’s answer and just for convenience (Windows only):

Create a file pax.bat (or pax.cmd or what ever you prefer instead of pax) under the bin subfolder of your local texmf tree. Under MiKTeX you perhaps first need to create one: Create a local texmf tree in MiKTeX.

Now the preferred variant: Executing the perl file (an installation of a Perl distribution is necessary):

Edit pax.bat, adjust paths to your settings

@echo off
SETLOCAL

set CLASSPATH=C:\PDFBox\lib\PDFBox-0.7.3.jar;%CLASSPATH%

perl C:\MiKTeX\scripts\pax\pdfannotextractor.pl %*

You could even leave out the set CLASSPATH line, if you’d create a path <localtexmf>\scripts\pax\lib, put PDFBox-0.7.3.jar in it and refresh the filename database (fndb).

Then on the Command Prompt you can call pax FileWithBookmarks.pdf or pax --debug FileWithBookmarks.pdf > paxdebug.log. This assumes, that there is no other pax.exe or similar on the system path, otherwise always make your call with pax.bat ....

Executing java directly is a bit more complicated:

Again edit pax.bat and adjust paths to your settings

@echo off
SETLOCAL

set CLASSPATH=C:\PDFBox\lib\PDFBox-0.7.3.jar;C:\MiKTeX\scripts\pax\pax.jar;%CLASSPATH%

java pax.PDFAnnotExtractor %*

Note that pax.jar was added to the classpath. I prefer to set the environment variable CLASSPATH, but the command line option -classpath, or short -cp, works as well, as shown by Heiko.