I'm using seqsplit
to split long words within cells of a longtable
. I'm using utf8x
and ucs
packages too. I'm generating PDFs out of those .tex
files.
When words have UTF-8 characters, the first one in the sequence raises an error.
\seqsplit{Música} using utf8x appears as `M[U+FFFD]sica`
This is the error it raises:
! Package utf8x Error: MalformedUTF-8sequence.
See the utf8x package documentation for explanation.
Type H <return> for immediate help.
...
l.398 .. com} & \seqsplit{Música}
Ifthecharacterisanargument,putitin{}
Package ucs Warning: Unknown character 65533 = 0xFFFD appeared again. on input
line 398.
If I remove seqsplit
, the word appears correctly, but I need to use this package, maybe someone knows an alternative or macro I can use.
The funniest part is that if the word contains two or more UTF-8 characters, what I get is:
\seqsplit{Múúsica} `M[U+FFFD]úsica`
Only fails in the first character, so I'm sure UTF-8 encoding is correctly done.
Best Answer
Unicode points > 7 bit are encoded with several bytes in UTF-8. Package
seqsplit
does not know this, as it is written for long DNA/RNA/protein/… sequences. It is the wrong package for natural text. Languages have rules, where breakpoints are allowed in words (usually not after each letter) and they request the insertion of a hyphenation char.Thus for narrow columns I recommend package
ragged2e
with command\Raggedright
that is similar to\raggedright
, but allows hyphenation. It therefore fills the available space better.Nevertheless, if a sequence for
\seqsplit
contains UTF-8 chars and a Unicode TeX engine (XeTeX, LuaTeX) is not used, then the UTF-8 sequences can be grouped and protected for\seqsplit
: