\documentclass[a4paper]{article}
\usepackage[latin1]{inputenc}
\usepackage{xparse}
\ExplSyntaxOn
\NewDocumentCommand{\capitalize}{>{\SplitList{~}}m}{
\CapitalizeFirst#1\Capitalize\unskip
}
\ExplSyntaxOff
\def\Sentinel{\Capitalize}
\def\CapitalizeFirst#1{\MakeUppercase#1 \Capitalize}
\def\Capitalize#1{%
\def\next{#1}%
\ifx\next\Sentinel
\expandafter\unskip
\else
\CheckInList{#1}\space\expandafter\Capitalize
\fi}
\def\CheckInList#1{%
\ifcsname List@\detokenize{#1}\endcsname
#1%
\else
\MakeUppercase#1%
\fi}
\makeatletter
\def\AppendToList#1{%
\@for\next:=#1\do
{\expandafter\let\csname List@\detokenize\expandafter{\next}\endcsname\empty}}
\makeatother
\AppendToList{a,is,of}
\begin{document}
\capitalize{here is a list of words école}
\end{document}
Won't work with UTF-8 (unless XeLaTeX or LuaLaTeX are used).
It won't work with UTF-8 in pdflatex
because \MakeUppercase
will apply only to the first byte of a possible two, three or four byte combination (for Western languages probably only two). For that to work one has to feed the whole block of bytes to \MakeUppercase
.
To be clearer: when we say \MakeUppercase
, LaTeX will uppercase the argument; in general the call is \MakeUppercase{word}
; here we're saying instead \MakeUppercase#1
(without braces), so only the first token (usually a character) will be uppercased; here's where it will fail with input such as \'ecole
: the token passed to \MakeUppercase
would be \'
, which it doesn't know what to do. Using école
(and a one byte encoding such as latin1
), \MakeUppercase
will process é
and give the correct result.
With UTF-8 this would fail: what we see as é
on our screen when writing a LaTeX document is actually two bytes (C3 and A9, for é
) and again \MakeUppercase
would be passed only the first one. So a more complex routine is necessary.
In order to have this work with pdflatex
and UTF-8, the definition of \CheckInList
and \CapitalizeFirst
above can be changed into the following
\def\CapitalizeFirst#1{\expandafter\UC@next#1 \Capitalize}
\def\CheckInList#1{%
\ifcsname List@\detokenize{#1}\endcsname
#1%
\else
\expandafter\UC@next#1%
\fi}
\def\UC@next#1{%
\ifx#1\UTFviii@two@octets
\expandafter\@firstoffour
\else
\ifx#1\UTFviii@three@octets
\expandafter\expandafter\expandafter\@secondoffour
\else
\ifx#1\UTFviii@four@octets
\expandafter\expandafter\expandafter\expandafter\expandafter
\@thirdoffour
\else
\expandafter\expandafter\expandafter\expandafter\expandafter
\expandafter\expandafter\@fourthoffour
\fi
\fi
\fi
{\UC@two}{\UC@three}{\UC@four}{\MakeUppercase}#1}
\def\UC@two#1#2#3{\MakeUppercase{#1#2#3}}
\def\UC@three#1#2#3#4{\MakeUppercase{#1#2#3#4}}
\def\UC@four#1#2#3#4#5{\MakeUppercase{#1#2#3#4#5}}
\providecommand\@firstoffour[4]{#1}
\providecommand\@secondoffour[4]{#2}
\providecommand\@thirdoffour[4]{#3}
\providecommand\@fourthoffour[4]{#4}
However accent commands are not allowed (they aren't also in the other version).
UPDATE
After a few years, here's a better implementation, thanks to new expl3
features; it works for all engines.
\documentclass[a4paper]{article}
\usepackage{ifxetex}
\ifxetex
\usepackage{fontspec}
\else
\usepackage[T1]{fontenc}
\usepackage[utf8]{inputenc}
\fi
\usepackage{xparse}
\ExplSyntaxOn
\NewDocumentCommand{\capitalize}{>{\SplitList{~}}m}
{
\seq_clear:N \l_capitalize_words_seq
\ProcessList{#1}{\CapitalizeFirst}
\seq_use:Nn \l_capitalize_words_seq { ~ }
}
\NewDocumentCommand{\CapitalizeFirst}{m}
{
\capitalize_word:n { #1 }
}
\sys_if_engine_pdftex:TF
{
\cs_set_eq:Nc \capitalize_tl_set:Nn { protected@edef }
}
{
\cs_set_eq:NN \capitalize_tl_set:Nn \tl_set:Nn
}
\cs_new_protected:Nn \capitalize_word:n
{
\capitalize_tl_set:Nn \l_capitalize_word_tl { #1 }
\seq_if_in:NfTF \g_capitalize_exceptions_seq { \tl_to_str:n { #1 } }
% exception word
{ \seq_put_right:Nn \l_capitalize_words_seq { #1 } } % exception word
% to be uppercased
{ \seq_put_right:Nx \l_capitalize_words_seq { \tl_mixed_case:V \l_capitalize_word_tl } }
}
\cs_generate_variant:Nn \tl_mixed_case:n { V }
\NewDocumentCommand{\AppendToList}{m}
{
\clist_map_inline:nn { #1 }
{
\seq_gput_right:Nx \g_capitalize_exceptions_seq { \tl_to_str:n { ##1 } }
}
}
\cs_generate_variant:Nn \seq_if_in:NnTF { Nf }
\seq_new:N \l_capitalize_words_seq
\seq_new:N \g_capitalize_exceptions_seq
\ExplSyntaxOff
\AppendToList{a,is,of,óf}
\begin{document}
X\capitalize{here is a list of words óf école}X
\end{document}
Best Answer
Hows this:
This is something that I wrote these past two days. It seems to do the job. I sure that it will fail if you start hyphenating words, using quotation marks, etc. I'm am sure it can be cleaned up further, but I leave that for someone else.