[Tex/LaTex] Splitting Strings by Comma

macrosstringsxparsexstring

I need to split a string by commas, and I haven't found an easy way to do it in xstring or xparse (although I may be missing something obvious). So if I say:

\def\pleaseHelp{"I, am, confused"}

What is the simplest way to extract:

{"I", "am", "confused"}

from \pleaseHelp and store it as a variable? I feel like this should be simple.

Best Answer

listofitems is a powerful list parsing package. Your example is the simplest of its capability:

\documentclass{article}
\usepackage{listofitems}
\begin{document}
\def\pleaseHelp{I, am, confused}
\readlist*\mylist{\pleaseHelp}% star option removes surrounding whitespace
Individual items:\\
``\mylist[1]'' and\\
``\mylist[2]'' and\\
``\mylist[3]''

Loop over list:
\foreachitem\x\in\mylist[]{\ifnum\xcnt=1\else\ and \fi``\x''}
\end{document}

Additionally, you can change the the parsing separator with the following prior to the \readlist:

\setsepchar{;}

You can parse based on several separators simultaneously with an or (||) separated specification of separators:

\setsepchar{,||.||;||:}
\readlist*\Z{To be, or not to be; that is the question.}

will yield a list containing \Z[1] as To be, \Z[2] as or not to be, \Z[3] as that is the question.

You can do nested parsing with a slash (/) separated list of separators, for example:

\setsepchar{*/,}
\readlist*\Z{this is, a test* of multi-level, parsing}

Then, \Z[1,1] contains this is, \Z[1,2] contains a test, \Z[2,1] contains of multi-level, and \Z[2,2] contains parsing.

When I use the word "contains", I mean "expands to the actual tokens, via two expansions" so that, in the above example, \detokenize\expandafter\expandafter\expandafter{\Z[2,2]} will yield parsing.

Related Solutions

[Tex/LaTex] How to split a string

You need to define a macro which has the separation character in the parameter text:

\def\testthreewords#1{\threewords#1\relax}
\def\threewords#1 #2 #3\relax{ First: (#1), Second: (#2), Third: (#3) }
\testthreewords{Now good enough}

If you want to be able to provide a macro as argument you need to expand it first. This can be either done once (only first macro is expanded once):

\def\testthreewords#1{\expandafter\threewords#1\relax}

or completely:

\def\testthreewords#1{%
    \begingroup
    \edef\@tempa{#1}%
    \expandafter\endgroup
    \expandafter\threewords\@tempa\relax
}

The \relax here is used as an end marker and must not occur in the argument, otherwise a different macro should be used, like \@nnil. The grouping is added to keep the temporary definitions local.

However this setup fails with an error if the two spaces are not included in the argument. To be on the safe side you should read every substring on its own and add the separation character to the end as a fail-safe. Then you test if the end was reached:

\def\testwords#1{%
    \begingroup
    \edef\@tempa{#1\space}%
    \expandafter\endgroup
    \expandafter\readwords\@tempa\relax
}
\def\readwords#1 #2\relax{%
      \doword{#1}%  #1 = substr, #2 = rest of string
      \begingroup
      \ifx\relax#2\relax  % is #2 empty?
         \def\next{\endgroup\endtestwords}% your own end-macro if required
      \else
         \def\next{\endgroup\readwords#2\relax}%
      \fi
      \next
}
\def\doword#1{(#1)}
\def\endtestwords{}


\testwords{Now good enough}% Gives `(Now)(good)(enough)`
\testwords{Now good}% Gives `(Now)(good)`

Use a command with variables that uses \IfSubStr on the lowercased value of the input variables

You need that the macros are expanded before \lowercase processes the token list.

Let's say you have \def\foo{baz}. If you do

\lowercase{\edef\tmpa{\foo}}

you get exactly the same as \edef\tmpa{\foo}, because \lowercase only changes character tokens. You might do

\lowercase\expandafter{\expandafter\def\expandafter\tmpa\expandafter{\foo}}

so that \foo is expanded prior to \lowercase starting its job. But this wouldn't work if instead of just \foo you have something more complex that expands to character tokens, say two macros, for instance.

To overcome the issue you can use \expanded, if you're sure that a fairly recent version of pdftex (or other engine) is used.

\newcommand{\setCmd}[3]{%
  \expanded{\lowercase{\def\noexpand\tmpa{#1}}}%
  \expanded{\lowercase{\def\noexpand\tmpb{#2}}}%
  \IfSubStr{\tmpa}{\tmpb}
    {\renewcommand{#3}{#2}}% true
    {\edef\dbgstr{inputs: #1, #2; lowercased: \tmpa, \tmpb}}% false
}

Otherwise, the usual trick:

\newcommand{\setCmd}[3]{%
  \begingroup\edef\x{\endgroup
    \lowercase{\def\noexpand\tmpa{#1}}%
  }\x
  \begingroup\edef\x{\endgroup
    \lowercase{\def\noexpand\tmpb{#2}}%
  }\x
  \IfSubStr{\tmpa}{\tmpb}
    {\renewcommand{#3}{#2}}% true
    {\edef\dbgstr{inputs: #1, #2; lowercased: \tmpa, \tmpb}}% false
}

Best Answer

Related Solutions

[Tex/LaTex] How to split a string

Use a command with variables that uses \IfSubStr on the lowercased value of the input variables

Related Question