Elaborating my answer in a comment to the question, this is what I got so far.
You need to install Python (I installed python2.7), and lxml and PIL. The easiest way I've found to install the later in Windows is going to http://www.lfd.uci.edu/~gohlke/pythonlibs/, and download lxml-2.3.4.win32-py2.7.exe
and PIL-1.1.7.win32-py2.7.exe
(note that you have to choose the appropiate files for your python version). Running those exe
, the appropiate libraries and bindings are installed.
Then you can download https://github.com/mikemaccana/python-docx. I didn't try to properly install this one. I only uncompressed it in a folder, open a cmd
shell, navigate to that folder and run the provided examples (example-extracttext.py
and example-makedocument.py
) which worked. My setup was fine.
Then I adapted the code of example-extracttext
to our needs, and wrote the following script, which I named run.py
:
#!/usr/bin/env python2.7
'''
This file opens a docx (Office 2007) file and dumps the text. Then it uses pdflatex to compile it.
'''
from docx import *
import os
import sys
if __name__ == '__main__':
try:
wordfile = sys.argv[1]
latexfile = sys.argv[1].replace('docx', 'tex')
logfile = sys.argv[1].replace('docx', 'log')
document = opendocx(wordfile)
newfile = open(latexfile,'w')
except:
print('Please supply an input file. For example:')
print(''' run.py 'MyDocument.docx' ''')
exit()
# Fetch all the text out of the document we just created
paratextlist = getdocumenttext(document)
# Make explicit unicode version
newparatextlist = []
for paratext in paratextlist:
newparatextlist.append(paratext.encode("utf-8"))
## Print our documnts test with two newlines under each paragraph
newfile.write('\n\n'.join(newparatextlist))
newfile.close()
## Now use pdflatex to compile the result
os.system("pdflatex %s" % latexfile)
while "Rerun" in open(logfile).read():
os.system("pdflatex %s" % latexfile)
To test it, I wrote the following Word document (note that I used Word styles to mark the section titles, and used a table to insert the code of a tikz picture, and even inserted an image showing the result for that figure, obviously not in the first pass, but later). Note also which I used a Word bulleted list to help marking the itemized list. All this Word styles will be dropped when converting to plain text, but allows us to make the display more clear.
I saved this document with the name Prueba.docx
in the same folder than the script run.py
, and ran the script on the word file:
C:\Users\jldiaz\Downloads\mikemaccana-python-docx-647ee97>python run.py Prueba.docx
After two compilations (the script takes care of compiling again if references are not solved), the resulting pdf
is the following:
(at this point I used IrfanView to screen-capture the tikz picture and paste it into the word document)
Note: If you use SumatraPDF as pdf reader, you don't need to close the pdf document before compiling again. SumatraPDF updates the view when the pdf changes.
UPDATE:
Tested also with math, comments and revision marks. All works as expected (comments are ignored, revision marks are ignored, latest version of the text is what goes to the final .tex file).
However, caution about carriage returns in the Word file. "Enter" key in Word inserts a end-of-paragraph mark, which is translated by python into a blank line (which is a \par
to tex, so everything is fine). However in some environments, we don't want those blank lines (for example, inside an equation environment, or other places where TeX doesn't expect a \par
). We can avoid this by using Shift+Enter in Word, which inserts an end-of-line instead of an end-of-par. Those end-of-lines are translated by python to spaces.
My experiments with comments, revisions and math:
and the result after the script:
To say it short: No.
Word has no inbuild tex
distribution or engine to handle the LaTeX tables.
If you want to have LaTeX table in Word you can build it in LaTeX, compile it (with document class minimal
or standalone
) and add the resulting pdf in Word.
As far as I know is there no other way you want ...
Edit: Perhaps the best way is to show and teach your collaborators the beautiful typography of LaTeX ...
Edit: If you can include a pdf in your Word document directly depends on your used version of Word. Since Word 2010 you can do it directly (search msword include pdf file), before version 2010 not. Then you need the hints in the comments.
Best Answer
If you are not averse to using LibreOffice/OpenOffice, I would suggest TexMaths. I have used it under Writer and Impress and it works wonderfully.