[Tex/LaTex] Google Docs > Intermediate Format > Pandoc > ConTeXt

contextconversionpandoc

I have several dozen longish (20ish pages each) documents in Google Docs that I want to typeset using ConTeXt. I can export from Google Docs in a variety of intermediate formats including .docx, .odt, .pdf, .html, and .rtf. Pandoc can then convert the exported file to a ConTeXt .tex file.

My question is, is there any advantage in using any particular intermediate format that would produce a "better" .tex file? Or will Pandoc produce the same .tex file regardless of the source format? I am very new to ConTeXt ― please be gentle! ― so I am not even sure what I mean by "better," but at the very least it would require less tweaking or other cleanup.

Or maybe I'm doing this wrong. Like I said, I am a rank beginner in ConTeXt (and the TeX/LaTeX world in general).

If it matters, the source documents are all-text, no math or images, and about the only formatting I need to preserve are bold/italic and the very occasional footnote.

Thank you.


(Updated June 29 12:12 pm)

Thanks to DG' for the results of your experiment. I decided to conduct my own, taking it one step further to include PDF creation.

I made up a 450-word sample in Google Docs modeled on the format of my real documents. It includes bold, italic, non-Latin words, accented characters, and a footnote.

I downloaded versions in various formats.

My first surprise was the size range of the downloaded files: from 929,734 (docx) and 106,254 (rtf) to 18,085 (odt), 7,759 (html), and 4,441 (epub).
Then I ran them through pandoc using the -s (standalone) option. File sizes ended up all in the 4-6k range except rtf at 108k. That is an awful lot of overhead! In any event, the rft file wouldn’t compile.

Only two of Pandoc’s .tex files retained the bold or italic markup — the ones from the odt and doxc intermediaries.

The ConTeXt-generated pdfs all had faults. All were able to handle the accented characters and did fine with Greek, but there was just a space where the tex file had Hebrew. Triple hyphens in the tex file were not rendered as em-dashs. A word at the end of a line that was divided in the justification process was not hyphenated. There appear to be no ligatures in the PDF. I’m not familiar enough with ConTeXt to know whether any of that is default behavior that can be modified.

At this point, the best solution for me seems to be using odt files. LibreOffice is my word processor of choice, and that gives me the option of doing any editing either odt or tex file. (Also, I like open source.) But I still would like to know why the PDF creation process in ConTeXt isn’t rendering this code properly: {\em italics}, {\bf bold}, {\em {\bf bold italics}}

Best Answer

I was curious and tested html, epub, and docx and it turns out that docx to context is the winner.

The sample document

I created the following document in google docs and exported it to html, epub and docx.

enter image description here

html to context

pandoc Test.html -t context -o Test.html.tex

Results in:

This is a sample document

There is italic text and~there is~bold text and maybe~bold italic text.
Also there is the occasional footnote.\high{\goto{{[}1{]}}[ftnt1]}

This will go on for twenty pages or so\ldots{}

\thinrule

\goto{{[}1{]}}[ftnt_ref1]~This is footnote~text.

You can see that html loses bold and italic and the footnote is awkward.

epub to context

The same formating loss and references awkwardness applies to epub.

pandoc Test.epub -t context -o Test.epub.tex

Results in:

This is a sample document

There is italic text and ~there is ~bold text and maybe ~ bold italic
text. Also there is the occasional footnote.
\high{\goto{{[}1{]}}[Test.xhtmlux5cux23ftnt1]}

This will go on for twenty pages or so\ldots{}

\thinrule

\goto{{[}1{]}}[Test.xhtmlux5cux23ftnt_ref1] ~This is footnote ~ text.

docx to context

On the other hand, the combination of docx-reader and context-writer produces decent code.

pandoc Test.docx -t context -o Test.docx.tex

Results in:

{\bf This is a sample document}

There is {\em italic text} and there is {\bf bold text} and maybe
{\em {\bf bold italic text.}} Also there is the occasional
footnote.\footnote{This is {\em footnote} {\bf text.}}

This will go on for twenty pages or so...