[Tex/LaTex] How to break long words after n chars (long genomic sequences)

discretionaryline-breaking

I have to include relatively small genomic sequences in a Latex document (still, they are words of around 600-900 As Ts Gs and Cs) but I cannot find a way to force line wrapping after a certain number of characters.

I saw a massive amount of similar topics on the matter, most of them citing listings and discretionaries, but all of the suggested solutions rely on either the small size of the long words to manually add "invisible breakpoints", or the fact that within the words, some special characters (dots, underscores) or patterns, can be defined as breakable areas.

Unfortunately, the sequences I work on are too long to edit manually, and there's no special pattern that can be used within a discretionary, and none of these solutions would help if I want to stick to the standard 50 bases per line.

I'm really at loss of ideas, so if you would have any lead, please share!

Here is an example of sequence that I need to include:

CTCCTTGGGCTGTTATTCCGTAAAAGTATTTGTGGAAGATACGGCTGTCATACATGATATGTTTTTTGTTTATAACAATAGTTCTTTCTTTGATTTCACCATAGGTTGCCTCAAATTGCTCTTTTGTTGCTTGTCCAGCTGTTAAGACTAAATGTTTTGACCCCTCATTTATAAGACCGATTGCGTTGAATGGTAAGACATTCTGTTGTGCTGATTGTAATTCTGAATAGCTACGGATTTTTATGAAGATATAGTTTTTTAATATTGGTATTTCATTCCAGACATACTTCTGTATAAAGGATTTATTAAACGGTGTTGTTTTGATTGCTCTATAATACTTATCTTGTTGTCCTCTTAATTTTACCCAAGGTCTTTCAAACTCTTGGGAGTTAATGATTATAAGCATATTGTAAAGCTGTCCAGCTAATCCGAAGAATACTGGAAGCCAGTGGGTAAAGCTTGTCTGTTTTGGTAAAGCTGTTTGAACGTCTGACAAGAACAAGTCCAGACCTTCATATTTGTGGATTTTTTGAAACTTCATATTTTGATATGAACCGTCTACAATATCACTATATTTTACTGGTTGCCCAGTTTTTTGATTAATGTATCCAGGTCTTTAATATCTACTACTAAAACCACCGTAACCATAGTCCACGTTAGAGATATAGAGAGGTTTCGCATAAATGTGAACCCAGATTGCTTGTTGTTGTCTTTCATAACTCATTTGAAGACCAGTTTTAATGCGTTCTTTAATTGCTTGATACGTT

Best Answer

The seqsplit package will break up such expressions, by adding suitable break points. It is designed exactly for these types of DNA sequences, and copes in a sophisticated way with various forms of input. However, you wish to break your material after a specific number of characters instead. This can be achieved with the commands provided by the xstring package, via its splitting command \StrSplit:

Sample output

\documentclass{article}

\usepackage{xstring,etoolbox}

\newcommand{\fixsplit}[2]{\StrLen{#2}[\mynum]\ifnumcomp{\mynum}{<}{\numexpr(#1)+1\relax}%
  {#2}%
  {\StrSplit{#2}{#1}{\myfirststr}{\mysecondstr}\myfirststr\linebreak
  \fixsplit{#1}{\mysecondstr}}}

\begin{document}


\begin{quote}
  \ttfamily
  \fixsplit{30}{CTCCTTGGGCTGTTATTCCGTAAAAGTATTTGTGGAAGATACGGCTGTCATACATGATATGTTTTTTGTTTATAACAATAGTTCTTTCTTTGATTTCACCATAGGTTGCCTCAAATTGCTCTTTTGTTGCTTGTCCAGCTGTTAAGACTAAATGTTTTGACCCCTCATTTATAAGACCGATTGCGTTGAATGGTAAGACATTCTGTTGTGCTGATTGTAATTCTGAATAGCTACGGATTTTTATGAAGATATAGTTTTTTAATATTGGTATTTCATTCCAGACATACTTCTGTATAAAGGATTTATTAAACGGTGTTGTTTTGATTGCTCTATAATACTTATCTTGTTGTCCTCTTAATTTTACCCAAGGTCTTTCAAACTCTTGGGAGTTAATGATTATAAGCATATTGTAAAGCTGTCCAGCTAATCCGAAGAATACTGGAAGCCAGTGGGTAAAGCTTGTCTGTTTTGGTAAAGCTGTTTGAACGTCTGACAAGAACAAGTCCAGACCTTCATATTTGTGGATTTTTTGAAACTTCATATTTTGATATGAACCGTCTACAATATCACTATATTTTACTGGTTGCCCAGTTTTTTGATTAATGTATCCAGGTCTTTAATATCTACTACTAAAACCACCGTAACCATAGTCCACGTTAGAGATATAGAGAGGTTTCGCATAAATGTGAACCCAGATTGCTTGTTGTTGTCTTTCATAACTCATTTGAAGACCAGTTTTAATGCGTTCTTTAATTGCTTGATACGTT}
\end{quote}

\end{document}

Note that I have chosen to print the result with a fixed width font, otherwise you get a rather strange effect. Also note that the way xstring works, results of operations usually have to be stored in a macro, rather than being used directly.

Related Question