[Tex/LaTex] How to change the encoding of the files

input-encodings

Back in the old days, before I left the hallowed shores of my native country, I didn't much care what encoding my files were in. ASCII was enough for my great-great grandfather so it was enough for me. Now I live in the land of my great-great-…-great grandfathers who have a slightly mangled alphabet and ASCII no longer suffices. That would be fine if it were just me: I've learned to love UTF-8 and have even embraced xelatex. But sometimes I get sent documents in weird encodings, and sometimes I want to resurrect some old document, maybe to include a section in a new document, so from time to time I find myself wanting to change the encoding of a document.

So: how do I do that? (Included subquestion: how do I determine the encoding of a file?)

Notes:

  1. I realise that this is only tangentially related to TeX and friends, so am fully prepared to be told to look elsewhere, but I think that this is quite common and especially (due to the fact that it seems one has to be super-aware of encodings with TeX) important for TeXers.
  2. I don't have a specific example file in mind here, this is a "generic" question hoping to build a useful resource. So please answer in as full generality as you can and where you need to place restrictions, please make them clear. In particular, this will almost certainly have different answers depending on the OS.
  3. On the other hand, if you do know a super-snazzy-wizzy method that works just brilliantly when using Emacs at midnight with a full moon, then please do post it – just be sure to include whether you are assuming the strong lycanthropic principle or only the weak one.
  4. In light of those last two, I'd be happy for this to be CW with one answer gathering together all the techniques in a sensible grouping.
  5. If this question doesn't get closed and does work as I intend, these notes should probably be removed so as not to distract from the usefulness of the answers (and because we probably don't want TeX-SX to be the number one hit for the "strong lycanthropic principle").

Best Answer

Regarding Emacs: Sometimes I stumble upon a *tex-file encoded with latin-1 or latin-9, but the second line usually is \usepackage[latin1]{inputenc}. In Emacs, I delete this line and add via C-c C-m a new \usepackage, wait until Emacs finished looking which packages are installed and type inputenc. Emacs then suggests latin-1, I type utf-8 and Emacs asks, whether the whole buffer should be encoded with utf-8. YES! And Emacs recodes my file.

EDIT: According to comments, it seems useful to add: this is an AUCTeX feature. AUCTeX is a powerful Emacs package which adds much comfort when writing *.tex files.

Otherwise look for the manpage of recode.

Alexander

Related Question