[Tex/LaTex] Cut and Paste Non-Math Text from MS Word to a .tex file

copy/pasteinput-encodingsmswordpunctuationsymbols

I am helping a friend use LaTeX to generate a book of short stories. The stories are received from the authors in MS Word — unfortunately that is only text editor most of the world knows. There is no math content to worry about, just plain text. However, Word likes to convert some plain text into other characters: The two that I have noticed so far are the quotes and ....

I tried the suggested approach of using inputenc without any success even with various input encoding. I am using \inputencoding as opposed to a package option as I feel as if I might need to change them between various stories.

enter image description here

So, what is the suggested approach to handle this? Ideally, I'd prefer to have some way of mapping these characters to the appropriate LaTeX friendly ones.

Notes:

  • I personally don't like leaving the smart quotes as there are cases where the authors have missed a closing quote and then all the subsequent quotes are incorrect. If this is caught early on, it can easily be corrected in the Word doc before pasting into a .tex file. But often, the editor has made significant edits to the .tex file before this problem is noticed. Hence, the preference to have csquotes handle this problem rather than using the specific open and close quotes.

References

Code:

\documentclass{article}
\usepackage{inputenc}
\usepackage{csquotes}
\MakeOuterQuote{"}

%\inputencoding{utf8}
%\inputencoding{latin1}
%\inputencoding{ansinew}
\inputencoding{cp1252}

\begin{document}

"It's too late now…" (should have \ldots\ before end quote)

“Please, sir, don’t.”  (should have left and right quotes)
\end{document}

Best Answer

Regarding the inputenc question

Your example works without problem if I copy it in an utf8-document and declare the inputenc encoding accordingly as utf8. Ditto with ansinew.

I can't really imagine how you could get the output in your image -- it can be created but imho not with the standard files. None of them would replace non-ascii chars with question marks.

Regarding quotes

Straight quotes (") are active in german tex documents and used for a lot of useful things like adding break points and hyphens. So I would never use them for real quotes and I prefer word files with smart quotes. When copying from word with (german) smart quotes to tex I use \MakeAutoQuote{„}{“} in the tex document. As such quotes creates a group I get warnings or errors if the smart quotes in word document are not correctly balanced which avoids most errors. But word files are never perfect and so an simple copy & paste is never enough. One always has to read and check the result.