[Tex/LaTex] Converting TEX to TXT using Pandoc

pandoc

I am trying to convert a .tex file into a .txt file, so that it could be directly copy pasted into environments that supports only MathJax. For example blogs, Mathematics Stackexchange, stackedit.io etc.

But I am having problem with user defined enviornments like theorem, definition etc.

\begin{proof}
Example
\end{proof}

In a latex editor, the pdf would be rendered as

Proof . Example

But converting it to .txt using the code pandoc -o output.txt input.tex the output is rendered as

Example

It is missing the headings. Similarly other-user defined environments also miss their respective headings.

Is there some way to make Pandoc add the word "Proof" or heading corresponding to an environment at the beginning?

Best Answer

Short answer: no.

Long answer:

The only ways an automated script can know that the output should contain the word "Proof" are:

1) This knowledge is hardcoded in the script. It knows about the meaning of some latex commands and environments (the via taken by pandoc)

2) It can run tex code and get the output (via taken by t4ht, for example)

The first approach is not flexible enough, since you can load packages which are not known by the script, and which define commands that will be ignored (in addition, your document can define your own commands too).

The second approach can be done via pdflatex followed by some "pdf to text" converter, or via latex followed by dvi2tty, or via tex4ht. In any case, it loses the original tex markup, and then is not appropiate if you want to keep the "code" of the math formulae.

Let's see an example. Consider the following document:

\documentclass{article}
\usepackage{nopageno} % No page numbers
\usepackage{amsthm}

\begin{document}
\begin{proof}
This is a proof
\[
    \sum_{i=0}^\infty x^2
\]
\end{proof}
\end{document}

Running it through standard pdflatex you get:

Result

  1. If you run it through pandoc, you get the following .txt:

     This is a proof $$\sum_{i=0}^\infty x^2$$
    

    in which you lost the word "Proof", and the final end-of-proof mark, but it keeps the formula markup.

  2. If you run it through pdflatex and then pdftotxt you get:

    Proof. This is a proof
    
    ∞
    
    x2
    i=0
    

    which keeps the word "Proof", but completly messes the formula

  3. If you run it through latex and then dvi2tty, you get:

    Proof. This is a proof
                                    1X
                                       x2
                                    i=0
    
                                                                       |___|
    

    Which is closer to the pdf output, but still loses the formula markup.

  4. If you run it through tex4ht you get an HTML version of the document, which can be in turn processed by pandoc to get the following .txt:

    Proof. This is a proof ∞ ∑ i=0 x2 \_\_
    

As you can see, none of the solutions is satisfactory.

Related Question