Use pax to Extract and Include Links from External PDF files in LaTeX on Windows

Published Sun 13 April 2014 in personal

by Bryan Weber

I want to include the documentation for one of my programming projects in my thesis. The documentation is generated by Sphinx, which can generate LaTeX output. I compiled that output to PDF, which is relatively easy to include in a separate LaTeX file with the pdfpages package. Unfortunately, including a separate PDF file in this manner breaks all the internal and external clickable links in the included PDF. The solution is an experimental Java program called pax. Pax reads the PDF file before it is included and generates a .pax file. pdfTeX reads the .pax file when it includes the PDF and re-generates all of the links.

Unfortunately, installing Pax is kind of a bear. The first thing to do is download Java and Perl. Java is easy to download and install on Windows; I used Strawberry Perl as my Perl interpreter. Strawberry Perl gets installed in a directory in the root of the C:\ drive to avoid any permissions issues with the Program Files directory. Next is to download PDFBox, which is a Java library for operating on PDF files. Only version 0.7.2 and 0.7.3 will work, which are available from SourceForge. Extract the zip file into a directory with write permissions, like C:\PDFBox.

Pax itself can now be installed from the MikTeX package installer. If you use the admin version, the scripts you need will be installed in ...\MikTeX 2.9\scripts\pax. Otherwise, note where MikTeX installed the scripts for Pax. Then, we need to create a local texmf directory. I followed the instructions here and created the directories C:\localtexmf\latex, C:\localtexmf\tex, and C:\localtexmf\bin. Then, in the Settings window for MikTeX, in the Roots tab, add this the root directory (i.e. C:\localtexmf) and on the General tab, click Update FNDB.

Now create a file in the bin directory called pax.bat and put into it

@echo off
SETLOCAL

set CLASSPATH=C:\PDFBox\lib\PDFBox-0.7.3.jar;%CLASSPATH%

perl "C:\Program Files\MiKTeX 2.9\scripts\pax\pdfannotextractor.pl" %*

Finally, make sure the C:\StrawberryPerl, C:\localtexmf\bin and Java directories are on your path. Then you can run

pax FileWithLinksToBeIncluded.pdf

This will generate the .pax file. Credit for this solution goes to the wonderful TeX.SX. Then, in the main TeX document, write

...
\usepackage{pdfpages}
\usepackage{pax}
...
\begin{document}
\includepdf[pages={-}]{FileWithLinksToBeIncluded}
\end{document}

and compile with pdf(La)TeX. Note that Pax only works with pdf(La)TeX. It cannot work with XeTeX (or XeLaTeX) according to this post. However, LuaLaTeX supports nearly all of the fontspec options that XeTeX does, and LuaLaTeX can be hacked to work with Pax. If you would like to use LuaLaTeX, add the following to your preamble instead of \usepackage{pax} credit:

\usepackage{pdftexcmds}
\makeatletter
\let\pdfescapename=\pdf@escapename
\let\pdfstrcmp=\pdf@strcmp
\makeatother
\usepackage{pax}