[Tex/LaTex] Creating Kindle-friendly versions of existing LaTeX documents

ebookformattingpdf

So let's say I have a large amount of documents (from ArXiv.org, suppose) that I have the LaTeX source for, and for which I would like to compile particularly Kindle-friendly versions. In particular, I would like to accomplish some combination of the following things:

Generate at specifically 1200×824 and/or 600×800 resolution
Make margins as small as possible since screen real estate is precious on the Kindle
Generate diagrams directly to 4-bit grayscale (the Kindle will do this conversion anyway, but I have a feeling it will be easier if the diagrams are just generated that way)
Do not draw boxes around links/references in the paper. These are useful on a computer but they still show up on the Kindle where it's not possible to click or select them anyway.

Does anybody know of the easiest way to apply settings such as these to a large number of document source files?

Best Answer

This will only be possible with a lot of manual intervention. (Many arXiv documents use plain tex or context or amstex instead of latex. But your question specified latex, so I'll assume that much.)

The main problem is that tex isn't very good at doing fully-automated typesetting. In particular, it will not rebreak displayed equations to fit with your new margins. It also has a pretty perfectionist view of regular text. It will try to set what it thinks is a beautiful paragraph, but if it fails then rather than making a compromise paragraph it will just give up and make an underfull/overfull box for the author to deal with (by rewording the paragraph, by adding explicit \linebreaks, by telling tex about additional hyphenations). This isn't a problem when you're editing your own paper, but it's very bad when you want to batch convert a large number of documents in an automatic way. Some tricks are discussed in this FAQ answer.

However, here are partial solutions to the problems you pointed out.

Resolution

Tex generates DVI and pdftex generates PDF. Both of these are resolution independent. If you want to convert to a bitmap format, then you can use something like pdf2png (slow) or dvipng (fast).

Margins

The geometry package gives a lot of control of margins.

Figures in grayscale

Figures on the arXiv will be in a huge variety of formats. EPS, PNG, PDF, not to mention metapost code, PSTricks, PSFrag, and the latex picture environment. The standard formats can all be rasterized to a 4-bit PNG at a decent resolution using something like GraphicsMagick, but it will likely need manual intervention or some really clever scripting. Once you've got the bitmap, you need to feed it to latex, dealing with the fact that some submissions use latex (which doesn't understand png) and others use pdflatex (which doesn't understand eps), they will use a combination of packages, such as graphics, graphicx, epsfig and various journal classes with their own twists, so there will need to be some intelligent editing of the latex file. For metapost, PSTricks, PSFrag you will need more extensive editing.

Removing boxes around hyperlinks

This one is relatively easy. In many cases, simply compiling the arXiv tarball will do this already, because the arXiv uses hypertex to add these links, and if you compile with regular tex this won't happen. If the authors used the hyperref you can in many cases remove the \usepackage line, or, more safely since maybe the author used some commands from hyperref, add the colorlinks option to colour the links instead of drawing boxes (you can set the colour to black if you don't even want to see them that way).

Here's a plan:

Create a kindle-friendly preamble (or a document class)

Redefine your page geometry to match your screen ratio (geometry package)
Remove most of the margins (also with geometry)
Remove or resize headers and footers (perhaps with fancyhdr)
Enlarge text until it is readable to you (depends on which TeX friend you are using)
Choose appropriate (legible) fonts (depends on which TeX friend you are using)
Redefine you section titles to smaller spacing (titlesec package)
Reduce other spacing as well (various lenghts like \abovecaptionskip)

Don't use absolute sizes in your documents:

Rescale your figures to fit factor\textwidth
Mathematics should automatically break at the end of the line (breqn package might help)

If you didn't create a document class:

Extract your document content (textbody) to a content-file
\include{content-file} after the \begin{document}

If you created a document class, you will just have to change your class before compiling.

Possible result

Kindle pdf result

The figure is the result of the following code:

\documentclass[10pt]{article}

\usepackage{fontspec}      % font selection
\setmainfont{Cambria}
\usepackage{breqn}         % automatic equation breaking
\usepackage{microtype}     % microtypography, reduces hyphenation
\usepackage{polyglossia}   % language selection
\setmainlanguage{english}

\usepackage{graphicx}      % graphics support

\usepackage[font=small,labelformat=simple,]{caption}   % customizing captions

\usepackage{titlesec}      % customizing section titles
\titleformat{\section}{\itshape\large}{}{0em}{}
\titlespacing{\section}{0pt}{8pt}{4pt}
\titleformat{\subsection}{\itshape}{}{0em}{}
\titlespacing{\subsection}{0pt}{4pt}{2pt}
\titleformat{\subsubsection}[runin]{\bf\scshape}{}{0em}{}
\titlespacing{\subsubsection}{0pt}{5pt}{5pt}

\usepackage[papersize={3.6in,4.8in},hmargin=0.1in,vmargin={0.1in,0.1in}]{geometry}  % page geometry

\usepackage{fancyhdr}   % headers and footers
\pagestyle{fancy}
\fancyhead{}            % clear page header
\fancyfoot{}            % clear page footer

\setlength{\abovecaptionskip}{2pt} % space above captions 
\setlength{\belowcaptionskip}{0pt} % space below captions
\setlength{\textfloatsep}{2pt}     % space between last top float or first bottom float and the text
\setlength{\floatsep}{2pt}         % space left between floats
\setlength{\intextsep}{2pt}        % space left on top and bottom of an in-text float

\begin{document}

In another moment down went Alice after it, never once considering how in the world she was to get out again.

\section{Wonderful section title}

Either the well was very deep, or she fell very slowly, for she had plenty of time as she went down to look 
\begin{figure}[htb]
\includegraphics[width=\textwidth]{alice}
\caption{Quite wide picture, resized to fit}
\end{figure}
about her and to wonder what was going to happen next.

\subsection{Tufte-style subsection}

then she looked at the sides of the well

\subsubsection{Saving space running in} and noticed that they were filled with cupboards and book\-shel\-ves. 

\begin{dmath}[label={sna74}]
\frac{1}{6} \left(\sigma(k,h,0) +\frac{3(h-1)}{h}\right)
+\frac{1}{6} \left(\sigma(h,k,0) +\frac{3(k-1)}{k}\right)
=\frac{1}{6} \left(\frac{h}{k} +\frac{k}{h} +\frac{1}{hk}\right)
+\frac{1}{2} -\frac{1}{2h} -\frac{1}{2k},
\end{dmath}

\end{document}

Different font sizes can be easily selected using fontspec.

[Tex/LaTex] Error creating pdf/a with latex

This is a color profile. Printers need them to correctly print color pdfs. There are a number of profiles on the net.

You can find this file here: https://github.com/bencomp/pdfx-ext/blob/master/sRGBIEC1966-2.1.icm