Can you use l3regex / \regex_replace_all:nnN for replacing each explicit character token of category 11 in a token list by its category 12 pendant

expl3l3regex

Can you use l3regex / \regex_replace_all:nnN for replacing each explicit character token of category 11(letter) in a token list by its category 12(other) pendant? Or by its category 6(parameter) pendant?

More generalized: Can you use l3regex / \regex_replace_all:nnN for replacing each explicit character token of a certain category in a token-list by its pendant of another category?
(Leave scenarios aside where this would lead to tokens of the token-list containing unbalanced braces at some time of the processing.)

If so: How? What must the replacement text with \regex_replace_all:nnN look like?

If not: Never mind. I will write my own replacement-routine. 😉

As expected by me the example below does not work out—\scratchy_showloop:n reveals that after the regexp-replace the token-list still contains "letters"=character tokens of category 11, and not "characters"=character tokens of category 12.

(I deliver this minimal example for satisfying those who insist in seeing a minimal example.)

\ExplSyntaxOn

\tl_new:N \l__scratchy_tl

\cs_new:Nn \scratchy_showloop:n {
  \quark_if_recursion_tail_stop:n {#1}
  \cs_show:N #1
  \scratchy_showloop:n
}

\tl_set:Nn  \l__scratchy_tl { abcdefg }

\regex_replace_all:nnN { \cL. } { \cO(\0) }  \l__scratchy_tl

\exp_args:NnV \use:nn \scratchy_showloop:n  \l__scratchy_tl  \q_recursion_tail \q_recursion_stop

\stop

Best Answer

You can use \tl_set_rescan:Nno instead, together with category code tables:

\ExplSyntaxOn

\cctab_new:N \g_xitoxii_cctab

\cctab_gset:Nn \g_xitoxii_cctab
 {
  \cctab_select:N \c_document_cctab
  \int_step_inline:nn { 255 }
   {
    \int_compare:nT { \char_value_catcode:n { #1 } = 11 } { \char_set_catcode_other:n { #1 } }
   }
 }

\tl_set:Nn \l_tmpa_tl {abcde@f&^}
\tl_set_rescan:Nno \l_tmpa_tl { \cctab_select:N \g_xitoxii_cctab } { \l_tmpa_tl }

\tl_analysis_show:N \l_tmpa_tl

\stop

The console would show

The token list \l_tmpa_tl contains the tokens:
>  a (the character a)
>  b (the character b)
>  c (the character c)
>  d (the character d)
>  e (the character e)
>  @ (the character @)
>  f (the character f)
>  & (alignment tab character &)
>  ^ (superscript character ^).
Related Question