[Tex/LaTex] Seqsplit problems with UTF8

unicode

I'm using seqsplit to split long words within cells of a longtable. I'm using utf8x and ucs packages too. I'm generating PDFs out of those .tex files.

When words have UTF-8 characters, the first one in the sequence raises an error.

\seqsplit{Música} using utf8x appears as `M[U+FFFD]sica`

This is the error it raises:

! Package utf8x Error: MalformedUTF-8sequence.

See the utf8x package documentation for explanation.
Type  H <return>  for immediate help.
 ...                                              

l.398 .. com} & \seqsplit{Música}

Ifthecharacterisanargument,putitin{}


Package ucs Warning: Unknown character 65533 = 0xFFFD appeared again. on input 
line 398.

If I remove seqsplit, the word appears correctly, but I need to use this package, maybe someone knows an alternative or macro I can use.

The funniest part is that if the word contains two or more UTF-8 characters, what I get is:

\seqsplit{Múúsica} `M[U+FFFD]úsica`

Only fails in the first character, so I'm sure UTF-8 encoding is correctly done.

Best Answer

Unicode points > 7 bit are encoded with several bytes in UTF-8. Package seqsplit does not know this, as it is written for long DNA/RNA/protein/… sequences. It is the wrong package for natural text. Languages have rules, where breakpoints are allowed in words (usually not after each letter) and they request the insertion of a hyphenation char.

Thus for narrow columns I recommend package ragged2e with command \Raggedright that is similar to \raggedright, but allows hyphenation. It therefore fills the available space better.

Nevertheless, if a sequence for \seqsplit contains UTF-8 chars and a Unicode TeX engine (XeTeX, LuaTeX) is not used, then the UTF-8 sequences can be grouped and protected for \seqsplit:

\documentclass{article}
\usepackage[utf8x]{inputenc}
\usepackage{seqsplit}
\begin{document}
\seqsplit{M{ú}sica}
\end{document}

Result