[Tex/LaTex] Special characters in input file

charactersinputinput-encodings

I want to have my sections in independent .txt files in the folder of my assignment. So I use \input{text.txt}. The problem is, I write in Danish and we have the 'æ' 'ø' 'å' characters that aren't normally recognized by LaTeX. I fixed it within the master .tex file, so when I write the letters, it shows up nicely in my document. It does not however show up nicely, if I put one of the characters in the file name or within the .txt files. Now, I don't mind that I have to avoid them in file names, but I really need them to be recognized within the file.

\documentclass[10pt,a4paper]{article}
\usepackage[danish]{babel}
\renewcommand{\danishhyphenmins}{22}
\usepackage{lmodern}
\usepackage[T1]{fontenc}
\usepackage[utf8x]{inputenc}
\usepackage[danish=quotes]{csquotes}

\begin{document}
æ ø å
\input{text.txt}
\end{document}

I can't upload the .txt file, but if you copy the three letters, put it in a document called text.txt file and put it in the folder, you'll see what I mean.

If I delete \input{text.txt} from the document, or avoid using those three letters in the input file, everything works perfectly and it writes 'æ ø å' in the output file.

I asked this question on another forum, but they weren't much help. They talked about it being the wrong encoding and told me, that if I saved it as .tex files there would be a dialogue that would enable me to change the encoding of the input document so it would be compatible, however there is no such thing in the 'Save as' dialogues I get from TeXworks. I also have it set to utf8 in my preferences in TeXworks, however that doesn't seem to influence much of anything.

Best Answer

I agree with the guys in the other forums - the issue is likely that the text file is in the wrong encoding - but I disagree with their solution. Depending on your operative system, I'll suggest two different solutions:

Under Linux

First a disclaimer: I use Ubuntu, and the exact commands might be slightly different under other distributions. The general idea is the same, however, so you should be able to iron out any cranks with the help of Google...

Confirming the diagnosis

To confirm that encoding is in fact the issue, cdto the folder where your files reside, and do

$ file *

That should give you an output like the following (etc for more files):

example.tex:             LaTeX 2e document, UTF-8 text
input.txt:               ISO-8859 text

If the text file is listed as ISO-8859 (or something similar), or in fact anything other than "UTF-8 text", then encoding is your problem.

Fixing the problem

To convert ISO-8859 (a.k.a. "Latin 1") to UTF-8, you can use the following command

$ iconv -f latin1 -t utf8 input.txt > input.utf8.txt

iconv is an encoding conversion utility. -f latin1 and -t utf8 are arguments to iconv that tell the program which encoding the file is currently in, and which encoding you want it in. For a complete list of possible encoding names, do iconv --list. The last argument is the file name of the input file (i.e. the one in the "wrong" encoding). iconv writes the file, in the new encoding, to stdout, so we redirect the output into a new file (don't use the same file name - you'll overwrite your file with an empty one).

Under Windows

Confirming the diagnosis

My standard way of confirming encoding problems under windows is to open the file in Notepad and select Save as... - then there's a little dropdownlist that lets you choose the encoding of the file - if you don't change it, it states the current encoding of the file. Usually, files that I find problematic when using UTF-8 turn out to be saved in ANSI, which is Microsoft's own encoding (and quite similar to ASCII).

If encoding is your problem, this dropdownlist shows something other than "UTF-8".

Fixing the problem

To fix it, simply select UTF-8 in the dropdownlist, (optionally) select a new file name for your input file, and hit Save.

Notepad converts the file intelligently, but if you experience problems you can (usually) simply reverse the process to get back the file you started with, and try something else.

Related Question