[Tex/LaTex] How to convert a scientific manuscript from LaTeX to Word using Pandoc


I have a typical scientific manuscript in a LaTeX .tex file, and I need to convert it to MS Word .doc file. The reason for having to convert to MS Word is I'm submitting the manuscript to an academic journal and they only accept MS Word (I know…)

The manuscript includes title page, figures, tables, equations (inline and in their own align environment), footnotes, bibliography, and an annex. The tables are in their own separate tables.tex file, which I include using the \include{tables} command. Most tables take up a whole landscape page, and were generated sing the package pdflscape. I am using Windows 7 Professional.

My plan is to use pandoc to go from .tex to .odt, open the latter in Libre Office, and convert to .doc. I have read a related question but it is too general. Similarly the examples in the Pandoc website are too simple. I have played around but I am unable to accomplish what I want. This is surprising since converting a scientific manuscript is probably the most common use case for Pandoc. Here are some sample failures:

Example 1

I open a command line in the project folder, and execute the following:

pandoc -s document.tex -o document.odt

I get this error message:

pandoc: figure1: openFile: does not exist <no such file or directory>

where figure1 is the name of a figure file (e.g. figure1.png) in the project folder referenced in a line as \includegraphics[width=5.8in]{figure1}. I suspect pandoc expects a .png extension but not sure how to provide it.

Example 2

Next I try .html, and excute the following:

pandoc -s document.tex -o document.html

The program executes fine. I open HTML file. Footnotes are there but figures are missing, tables are displayed as LaTeX, bibliography is missing, in-line math displays well, but math in align environment does not, section labels are displayed, and some other minor issues.

So given that mine is probably a typical use case scenario, my question is this: What commands should I use to get the .odt file I want? I could not find a fully worked out example on the web.

Here is a specific list of errors. I'll update how I corrected them based on community suggestions:

  1. Figures not rendering. Solved by adding .png extension to .tex file in \includegraphics command. Now figures are included but they are huge, with half of each figure outside the page.
  2. No bibliography. Solved. First, I have one huge consolidated Latex .bib file where I keep all my citations. I manage it using JabRef. This was giving me problems as I do not keep the cleanest .bib file in town. So I reduced the problem by using a neat trick in JabRef that allows you to subset your master .bib file using the .aux file generated by Latex when compiling your manuscript. In JabRef click on Tools > New Subdatabase based on AUX file. This way I generated a much smaller biblio.bib file with only the articles referenced in my manuscript. Running pandoc -s document.tex -o document.odt --bibliography=biblio.bib did the trick.
  3. Display math. Math in \begin{align} environment displayed in verbatim \latex; (A partial solution is to use the TexMaths Libre Office extension. Copy and paste the latex math code in the .odt file created by Pandoc into the equation editor, and so on. Surely this could be built into a macro that can post-process all remaining math.) UPDATE: Display math works very well using --mathjax extension.
  4. Inline math. Inline equation do not always render properly. Bold math is a problem. E.g. $\Sigma=\sigma^2\bm{I}$ displays as $\Sigma=\sigma^2\bm{I}$;
  5. Labels are displayed (e.g. section labels show as [sec:empirical] blah blah];
  6. All tables display as raw latex.

Best Answer

I tried nearly all methods mentioned in other answers.

Eventually, and surprisingly, I found the most satisfactory way to convert is to just open the PDF file in MS Word (2013 or newer), which retained most of the layout. Although you are gonna lose the hyperlinks of cross-references.