[Tex/LaTex] Extract first word in a string

macrosstrings

The title pretty much says it all, I need a command to get the first word in a string.

Based on this answer to another question of mine, I tried this:

\documentclass{article}

\makeatletter
\newcommand\FirstWord[1]{\@firstword#1 \@nil}%
\newcommand\@firstword{}%
\def\@firstword#1 #2\@nil{#1\unskip}%
\makeatother

\begin{document}
    \FirstWord{John, Paul, George and Ringo}
\end{document}

It almost works, except for the fact it includes the comma. I get:

John,

While I want just:

John

So how can I do that?

PS: Ideally, if more than one word is inside braces, they should count as one. So \FirstWord{{John, Paul}, George and Ringo} should print "John, Paul".

Best Answer

You're almost there, just remove the trailing comma

\documentclass{article}

\makeatletter
\newcommand\FirstWord[1]{\@firstword#1 \@nil}%
\newcommand\@firstword{}%
\newcommand\@removecomma{}%
\def\@firstword#1 #2\@nil{\@removecomma#1,\@nil}%
\def\@removecomma#1,#2\@nil{#1}
\makeatother

\begin{document}

X\FirstWord{John, Paul, George and Ringo}X

X\FirstWord{John}X

X\FirstWord{John and Paul}X

X\FirstWord{{John, Paul}, George and Ringo}X

\end{document}

enter image description here

You can add further tests for removing other delimiters

\documentclass{article}

\makeatletter
\newcommand\FirstWord[1]{\@firstword#1 \@nil}%
\def\@firstword#1 #2\@nil{\@removecomma#1,\@nil}%
\def\@removecomma#1,#2\@nil{\@removeperiod#1.\@nil}
\def\@removeperiod#1.#2\@nil{\@removesemicolon#1;\@nil}
\def\@removesemicolon#1;#2\@nil{#1}
\makeatother

\begin{document}

X\FirstWord{John; Paul; George; Ringo}X

X\FirstWord{John. Paul. George. Ringo}X

X\FirstWord{John}X

X\FirstWord{John and Paul}X

X\FirstWord{{John. Paul}. George. Ringo}X

\end{document}

If you don't need expandability, you can use l3regex:

\documentclass{article}
\usepackage{xparse,l3regex}

\ExplSyntaxOn
\NewDocumentCommand{\FirstWord}{m}
 {
  % split the argument at spaces
  \seq_set_split:Nnn \l_tmpa_seq { ~ } { #1 }
  % get the first item
  \tl_set:Nx \l_tmpa_tl { \seq_item:Nn \l_tmpa_seq { 1 } }
  % remove a trailing period, semicolon or comma (\Z matches the end)
  \regex_replace_once:nnN { [.;,]\Z } { } \l_tmpa_tl
  % output the result
  \tl_use:N \l_tmpa_tl
 }
\ExplSyntaxOff

\begin{document}

X\FirstWord{John, Paul, George and Ringo}X

X\FirstWord{John; Paul; George; Ringo}X

X\FirstWord{John. Paul. George. Ringo}X

X\FirstWord{John}X

X\FirstWord{John and Paul}X

X\FirstWord{{John, Paul}, George and Ringo}X

X\FirstWord{{John. Paul}. George. Ringo}X

\end{document}
Related Question