[Tex/LaTex] Creating Kindle-friendly versions of existing LaTeX documents

ebookformattingpdf

So let's say I have a large amount of documents (from ArXiv.org, suppose) that I have the LaTeX source for, and for which I would like to compile particularly Kindle-friendly versions. In particular, I would like to accomplish some combination of the following things:

  • Generate at specifically 1200×824 and/or 600×800 resolution
  • Make margins as small as possible since screen real estate is precious on the Kindle
  • Generate diagrams directly to 4-bit grayscale (the Kindle will do this conversion anyway, but I have a feeling it will be easier if the diagrams are just generated that way)
  • Do not draw boxes around links/references in the paper. These are useful on a computer but they still show up on the Kindle where it's not possible to click or select them anyway.

Does anybody know of the easiest way to apply settings such as these to a large number of document source files?

Best Answer

This will only be possible with a lot of manual intervention. (Many arXiv documents use plain tex or context or amstex instead of latex. But your question specified latex, so I'll assume that much.)

The main problem is that tex isn't very good at doing fully-automated typesetting. In particular, it will not rebreak displayed equations to fit with your new margins. It also has a pretty perfectionist view of regular text. It will try to set what it thinks is a beautiful paragraph, but if it fails then rather than making a compromise paragraph it will just give up and make an underfull/overfull box for the author to deal with (by rewording the paragraph, by adding explicit \linebreaks, by telling tex about additional hyphenations). This isn't a problem when you're editing your own paper, but it's very bad when you want to batch convert a large number of documents in an automatic way. Some tricks are discussed in this FAQ answer.

However, here are partial solutions to the problems you pointed out.

Resolution

Tex generates DVI and pdftex generates PDF. Both of these are resolution independent. If you want to convert to a bitmap format, then you can use something like pdf2png (slow) or dvipng (fast).

Margins

The geometry package gives a lot of control of margins.

Figures in grayscale

Figures on the arXiv will be in a huge variety of formats. EPS, PNG, PDF, not to mention metapost code, PSTricks, PSFrag, and the latex picture environment. The standard formats can all be rasterized to a 4-bit PNG at a decent resolution using something like GraphicsMagick, but it will likely need manual intervention or some really clever scripting. Once you've got the bitmap, you need to feed it to latex, dealing with the fact that some submissions use latex (which doesn't understand png) and others use pdflatex (which doesn't understand eps), they will use a combination of packages, such as graphics, graphicx, epsfig and various journal classes with their own twists, so there will need to be some intelligent editing of the latex file. For metapost, PSTricks, PSFrag you will need more extensive editing.

Removing boxes around hyperlinks

This one is relatively easy. In many cases, simply compiling the arXiv tarball will do this already, because the arXiv uses hypertex to add these links, and if you compile with regular tex this won't happen. If the authors used the hyperref you can in many cases remove the \usepackage line, or, more safely since maybe the author used some commands from hyperref, add the colorlinks option to colour the links instead of drawing boxes (you can set the colour to black if you don't even want to see them that way).