How to convert a TeX file to Microsoft Word or LibreOffice format

conversionlatex-to-wordpandocpdftextex4ht

I’m trying to convert a report from LaTeX to OpenDocument (.odt) or Office Open XML (.docx) formats. But the programs give errors and stop, so in the end neither formats are written.

First I use Pandoc:

pandoc -f latex -t docx -o file.docx file.tex

But then it outputs the message:

Error at "file.tex" (line 124, column 20):
expecting \end{figure}
\textsf{\large{}\includegraphics{pasted1}}{\large\par}

I have revised the code and there is the corresponding \par, so I tried to erase these tags, but it isn’t compiled. It says

unexpected {
\par\end{centering}

Again I revised it, but the centerings are written as \begin{centering} and \par\end{centering} and all are paired, so I can't locate the mismatch

  • Option 2: mk4ht
    Used the following command

    mk4ht oolatex file.tex

It starts compiling, shows a list of libraries and files, but it stops and outputs:

?
! Emergency stop.
<inserted text>
                \par
l.25 \babel@aux{spanish}{}

Output written on file.dvi (53 pages, 15708 bytes).
Transcript written on file.log.
--- error --- failed to execute command

And the dvi file can't be opened.

At starting, Pandoc was saying that the file isn't in UTF-8, and manually change the encoding. The TeX file is a large one, so perhaps isn’t for loading here, and I don't know what parts could be relevant to post.

So how can it be tweaked to get something converted to these formats?

MWE (this file is a portion of the main TeX file)

\documentclass[12pt,spanish]{article}
\usepackage[sc]{mathpazo}
\usepackage{helvet}
\usepackage{courier}
\usepackage[T1]{fontenc}
\usepackage[latin9]{inputenc}
\usepackage[letterpaper]{geometry}
\geometry{verbose,tmargin=1.5cm,bmargin=1.5cm,lmargin=1.5cm,rmargin=1.5cm}
\setcounter{tocdepth}{2}
\usepackage{color}
\usepackage{babel}
\addto\shorthandsspanish{\spanishdeactivate{~<>}}

\usepackage{float}
\usepackage{graphicx}
\usepackage{setspace}
\usepackage[unicode=true,
 bookmarks=true,bookmarksnumbered=true,bookmarksopen=true,bookmarksopenlevel=2,
 breaklinks=true,pdfborder={0 0 1},backref=false,colorlinks=true]
 {hyperref}
\hypersetup{pdftitle={Title},
 pdfauthor={Some_name},
 pdfsubject={opamp, feedback},
 pdfkeywords={opamp, feedback},
 linkcolor=black, citecolor=black, urlcolor=blue, filecolor=blue, pdfpagelayout=OneColumn, pdfnewwindow=true, pdfstartview=XYZ, plainpages=false}

\makeatletter
\pagenumbering{roman}
\let\myTOC\tableofcontents
\renewcommand\tableofcontents{%
  \pdfbookmark[1]{\contentsname}{}
  \myTOC
  \cleardoublepage
  \pagenumbering{arabic} }

\makeatother

\begin{document}
\listoffigures

\title{\textsf{\large{} Title}}
\author{\textsf{Bowie}\thanks{\textsf{\large{}\protect\href{mailto:[email protected]}{[email protected]}}}}

\maketitle
\textsf{\large{}\tableofcontents{}}{\large\par}

\section{\textsf{\large{}El amplificador operacional ideal}}

\subsection{\textsf{El amplificador operacional}}

\textsf{\large{}}\footnote{\textsf{\large{}footnote}}\textsf{\large{}.}{\large\par}

\textsf{\large{}Some text}{\large\par}

\textsf{\large{}More text }{\large\par}

\textsf{\large{}and more text}{\large\par}

\textsf{\large{}Again text}{\large\par}

\textsf{\large{}Bla... }{\large\par}

\subsubsection{\textsf{\large{}Notación}}

\textsf{\large{}
text added}{\large\par}
%here starts the problems
%the ouput is
%Error at "file.tex" (line 107, column 20):
%expecting \end{figure}
%\textsf{\large{}\includegraphics{pasted1}}{\large\par}
%
\begin{center}
\textsf{\large{}}
\begin{figure}[H]
{\centering}
\textsf{\large{}\includegraphics{pasted1}}{\large\par}
\par\end{centering}
\textsf{\large{}\caption{Caption of the figure}
\par}
\end{figure}
{\large\par}
\par\end{center}

\end{document}

UPDATE

The source is compiled with LaTeX and a PDF file is generated, but reviewing the log I have found the next information (the parts I think are relevant):

...
 restricted \write18 enabled.
 %&-line parsing enabled.
...
... no UTF-8 mapping file for font encoding PU
...
Package geometry Warning: The marginal notes overrun the paper.
     Add 11.32088pt and more to the right margin.
...
Output written on Intro.pdf (45 pages, 731901 bytes).
PDF statistics:
 565 PDF objects out of 1000 (max. 8388607)
...

And amid the compilations says something about missing $s, } unwanted or no $, and absent glyphs (this is in the terminal and I can’t save it, and it isn’t in the logs).

I have tested with other TeX files and the conversion went well.

UPDATE 2

I have reviewed the figures and I have the following code (of the part of the figure)

\begin{center}
\textsf{\large{}}
\begin{figure}[H]
\begin{centering}
\textsf{\large{}\includegraphics{pasted1}}{\large\par}
\par\end{centering}
\textsf{\large{}\caption{Caption for the figure.}
}{\large\par}
\end{figure}
{\large\par}
\par\end{center}

The centering and center are in pairs. So I don’t know why at the MWE is different, but even in this way the output is the same.

UPDATE 3

I have revamped the code and modified the figure sets, and now the document is well converted to .odt and .docx. The only thing is the figures lost their numeration and alignment.

Best Answer

Converting to ODT with make4ht

Today make4ht is the best tool for this purpose. Write your LaTeX file as usual, with the proviso that you avoid exotic syntax and remove all packages that are not strictly necessary just to process the file (i.e., to run latex without errors). Forget about fancy formatting; keep it simple.

Then you do as follows:

latexmk -pdf file
make4ht -f odt file

You can use LibreOffice to save the resulting ODT file in DOCX format. (You can actually do it on the command line in some setups.)

Alternative (better for HTML): lwarp

Another option is lwarp, which works wonderfully to generate HTML (see its documentation for how to do it, it's pretty simple). After opening the html file in a browser you can copy from the browser window and paste into LibreOffice and the results are passable. The unusual handling of footnotes in this setup were a dealbreaker for me, though.

But it's harder in Spanish

Even once I rewrote your sample document in correct and reasonable LaTeX code, I did run into problems because using Spanish in babel interacts badly with a lot of other packages.

  • I found that adding the es-sloppy option cleared up most of the problems by disabling some of the "bonus" features enabled in this Babel configuration. But still, when converting with make4ht, the table of contents included unwanted junk code written into the ODT output.
  • Removing the contents lists solved the problems, and perhaps you can do without them in the ODT output. (Libreoffice does have its own ways of generating contents lists.)

A separate question about babel Spanish and make4ht is warranted.

The following document I think captures most of what you were trying to do, and both compiles to PDF and converts to ODT with no problems.

\documentclass[letterpaper, 12pt]{article} 
\usepackage[T1]{fontenc}

% Because seems like you want a sans-serif typeface
\renewcommand*{\familydefault}{\sfdefault}

% Please note that the last option is necessary for conversion to work in
% Spanish
\usepackage[spanish,es-sloppy]{babel}

\usepackage{graphicx}
\usepackage{hyperref}

\title{Artículo}
\author{Bowie\thanks{\href{mailto:[email protected]}{[email protected]}}}

\begin{document}
\maketitle
% I can't get the contents lists to work in Spanish with make4ht
%\tableofcontents
%\listoffigures
\section{El amplificador operacional ideal}
\subsection{El amplificador operacional}
Algunas palabras.\footnote{Una nota.}
Más palabras.
Y más.
\subsubsection{Notación}
Añado algo más (vease figura \ref{fig:pasted1}).
\begin{figure}
    \label{fig:pasted1}
%    \includegraphics{pasted1} % I don't have the image to include
    \caption{Descripción de la imágen}
\end{figure}
\end{document}