I'm sorry I wont be able to provide any answer other than: "Don't do it!"
Other than using some of the converters that you and other people have suggested, which may give you a reasonable place to start, there is really no satisfactory solution to convert an existing Word document into a LaTeX one. This is difficult to implement, and for a good reason. Word and LaTeX have two completely different approaches at describing and encoding documents. Word (as most people use it) stores formatting information while LaTeX encodes the "logical structure" of a document (which then gets translated to TeX formatting instructions while compiling). Trying to get a LaTeX document out of a Word document is somewhat similar to trying to obtain a C++ program from it's binary: You can do it, but the results wont be pretty.
You are better off taking a printout of your assignments and typing the whole thing yourself in LaTeX. Of course you can copy/paste large bodies of text, but for equations there is nothing better than typing them yourself. If you do this for a while, and show your beautiful documents to your teachers and colleagues, you might be even able to convince them to switch to LaTeX all together.
LaTeX is not a format to store pretty documents, it is a fully-fledged system to create those documents.
Elaborating my answer in a comment to the question, this is what I got so far.
You need to install Python (I installed python2.7), and lxml and PIL. The easiest way I've found to install the later in Windows is going to http://www.lfd.uci.edu/~gohlke/pythonlibs/, and download lxml-2.3.4.win32-py2.7.exe
and PIL-1.1.7.win32-py2.7.exe
(note that you have to choose the appropiate files for your python version). Running those exe
, the appropiate libraries and bindings are installed.
Then you can download https://github.com/mikemaccana/python-docx. I didn't try to properly install this one. I only uncompressed it in a folder, open a cmd
shell, navigate to that folder and run the provided examples (example-extracttext.py
and example-makedocument.py
) which worked. My setup was fine.
Then I adapted the code of example-extracttext
to our needs, and wrote the following script, which I named run.py
:
#!/usr/bin/env python2.7
'''
This file opens a docx (Office 2007) file and dumps the text. Then it uses pdflatex to compile it.
'''
from docx import *
import os
import sys
if __name__ == '__main__':
try:
wordfile = sys.argv[1]
latexfile = sys.argv[1].replace('docx', 'tex')
logfile = sys.argv[1].replace('docx', 'log')
document = opendocx(wordfile)
newfile = open(latexfile,'w')
except:
print('Please supply an input file. For example:')
print(''' run.py 'MyDocument.docx' ''')
exit()
# Fetch all the text out of the document we just created
paratextlist = getdocumenttext(document)
# Make explicit unicode version
newparatextlist = []
for paratext in paratextlist:
newparatextlist.append(paratext.encode("utf-8"))
## Print our documnts test with two newlines under each paragraph
newfile.write('\n\n'.join(newparatextlist))
newfile.close()
## Now use pdflatex to compile the result
os.system("pdflatex %s" % latexfile)
while "Rerun" in open(logfile).read():
os.system("pdflatex %s" % latexfile)
To test it, I wrote the following Word document (note that I used Word styles to mark the section titles, and used a table to insert the code of a tikz picture, and even inserted an image showing the result for that figure, obviously not in the first pass, but later). Note also which I used a Word bulleted list to help marking the itemized list. All this Word styles will be dropped when converting to plain text, but allows us to make the display more clear.
I saved this document with the name Prueba.docx
in the same folder than the script run.py
, and ran the script on the word file:
C:\Users\jldiaz\Downloads\mikemaccana-python-docx-647ee97>python run.py Prueba.docx
After two compilations (the script takes care of compiling again if references are not solved), the resulting pdf
is the following:
(at this point I used IrfanView to screen-capture the tikz picture and paste it into the word document)
Note: If you use SumatraPDF as pdf reader, you don't need to close the pdf document before compiling again. SumatraPDF updates the view when the pdf changes.
UPDATE:
Tested also with math, comments and revision marks. All works as expected (comments are ignored, revision marks are ignored, latest version of the text is what goes to the final .tex file).
However, caution about carriage returns in the Word file. "Enter" key in Word inserts a end-of-paragraph mark, which is translated by python into a blank line (which is a \par
to tex, so everything is fine). However in some environments, we don't want those blank lines (for example, inside an equation environment, or other places where TeX doesn't expect a \par
). We can avoid this by using Shift+Enter in Word, which inserts an end-of-line instead of an end-of-par. Those end-of-lines are translated by python to spaces.
My experiments with comments, revisions and math:
and the result after the script:
Best Answer
I chose this
\mathscr{P}
and\tilde{N}
by mouse in the Word generated menu (on the top of screen); do you think this is better?By the way, why don't you think the mathjax output to be better? it seems to have inserted
\left
and\right
into parentheses automatcally.You seem to be very concerned with parenthesis height(?) If so, type
()
first, and simply paste (and do not edit inside) whatever inside which you have typed elsewhere. For example, you type\tilde{N}
on another line, and typep()
, and paste\tilde{N}
inside()
. This is because (it seems) MS Word automatically tries insert\left
and\right
; though not every time, the feature seem to be buggy.