[Tex/LaTex] Converting TEX to TXT using Pandoc

pandoc

I am trying to convert a .tex file into a .txt file, so that it could be directly copy pasted into environments that supports only MathJax. For example blogs, Mathematics Stackexchange, stackedit.io etc.

But I am having problem with user defined enviornments like theorem, definition etc.

\begin{proof}
Example
\end{proof}

In a latex editor, the pdf would be rendered as

Proof . Example

But converting it to .txt using the code pandoc -o output.txt input.tex the output is rendered as

Example

It is missing the headings. Similarly other-user defined environments also miss their respective headings.

Is there some way to make Pandoc add the word "Proof" or heading corresponding to an environment at the beginning?

Best Answer

Short answer: no.

Long answer:

The only ways an automated script can know that the output should contain the word "Proof" are:

1) This knowledge is hardcoded in the script. It knows about the meaning of some latex commands and environments (the via taken by pandoc)

2) It can run tex code and get the output (via taken by t4ht, for example)

The first approach is not flexible enough, since you can load packages which are not known by the script, and which define commands that will be ignored (in addition, your document can define your own commands too).

The second approach can be done via pdflatex followed by some "pdf to text" converter, or via latex followed by dvi2tty, or via tex4ht. In any case, it loses the original tex markup, and then is not appropiate if you want to keep the "code" of the math formulae.

Let's see an example. Consider the following document:

\documentclass{article}
\usepackage{nopageno} % No page numbers
\usepackage{amsthm}

\begin{document}
\begin{proof}
This is a proof
\[
    \sum_{i=0}^\infty x^2
\]
\end{proof}
\end{document}

Running it through standard pdflatex you get:

If you run it through pandoc, you get the following .txt:
```
 This is a proof $$\sum_{i=0}^\infty x^2$$
```
in which you lost the word "Proof", and the final end-of-proof mark, but it keeps the formula markup.
If you run it through pdflatex and then pdftotxt you get:
```
Proof. This is a proof

∞

x2
i=0
```
which keeps the word "Proof", but completly messes the formula

If you run it through latex and then dvi2tty, you get:

Proof. This is a proof
                                1X
                                   x2
                                i=0

                                                                   |___|

Which is closer to the pdf output, but still loses the formula markup.

If you run it through tex4ht you get an HTML version of the document, which can be in turn processed by pandoc to get the following .txt:
```
Proof. This is a proof ∞ ∑ i=0 x2 \_\_
```

As you can see, none of the solutions is satisfactory.

Related Solutions

[Tex/LaTex] (error) \tightlist (converting .md file into .pdf using pandoc)

The writer.latex file in Pandoc's source code currently defines \tightlist as:

\providecommand{\tightlist}{%
  \setlength{\itemsep}{0pt}\setlength{\parskip}{0pt}}

This is also currently the case in the default LaTeX template, from the jgm/pandoc-templates project on Github.

For posterity, here is a link to the most up-to-date LaTeX default template:

https://github.com/jgm/pandoc-templates/blob/master/default.latex

[Tex/LaTex] pandoc: converting between .tex and .rst

Pandoc can parse latex macro definitions. This means, that you can add dummy \newcommand statements to tweak what will end up in the output:

input file 1: dummy.tex

\newcommand{\mycommand}[3]{(#1) and (#2) and (#3)}

input file 2: mwe.tex

\section{Introduction}
Here is some text.

Here is some more text.

\begin{minipage}{.4\textwidth}
    \mycommand{first argument}{second argument}{3rd argument}
\end{minipage}%
\hfill
\begin{minipage}{.4\textwidth}
    This type of listing is a \texttt{.tex} file.
\end{minipage}%

run:

pandoc dummy.tex mwe.tex -o mwe.rst

output file mwe.rst:

Introduction
============

Here is some text.

Here is some more text.

(first argument) and (second argument) and (3rd argument)

.. raw:: latex

   \hfill

This type of listing is a ``.tex`` file.

And, as you can see, more recent versions of pandoc (in this case 2.1.1) handle minipages much better.

You might need to write a filter for more complicated issues.

Best Answer

Related Solutions

[Tex/LaTex] (error) \tightlist (converting .md file into .pdf using pandoc)

[Tex/LaTex] pandoc: converting between .tex and .rst

Related Question