[Tex/LaTex] How to get the number of occurrences of characters in a string

loopsxstring

How to get the number of occurrences of characters in a string?

\documentclass[14pt]{extarticle}
\usepackage{xstring}
\usepackage{forloop}
\usepackage{ifthen}
\usepackage{pgfkeys}
\newcommand{\fios}{hellow world}
\begin{document}
%\StrCount{\fios}{e}
%\StrChar{\fios}{1}
\StrLen{\fios}[\varL]
\varL

\vspace{5ex}
\newcounter{loop}
\forloop{loop}{1}{\value{loop}< \varL}
{\StrChar{\fios}{\arabic{loop}}~--
\par
}
\end{document}

I need to get:

h - 1
e - 1
l - 3
o - 2
w - 2
r - 1
d - 1

Best Answer

This is a Lua snippet with some CJKV characters in the strings. I've selected an opentype font from TeX Live, but it doesn't contain some diacritical letters (e.g. č, ř and š), let me hope it is sufficient as a demonstration of handling UTF-8 strings. We run lualatex mal-letters.tex.

% lualatex mal-letters.tex
\documentclass[a4paper]{article}
\pagestyle{empty}
\parindent=0pt
\usepackage{luacode} % to be able to write Lua code
\usepackage{fontspec} % to be able to load fonts (CJKV)

\begin{document}
\begin{luacode*}
chars={} -- Lua table to store occurencies
function countme() -- the core function
text=tex.toks[0] -- pass an argument from TeX
unicode.utf8.gsub(text, ".", function(s) -- find any utf8 char
   if not chars[s] then chars[s]=0 end -- define it if it is not
   chars[s]=chars[s]+1 -- plus one as a char has been found
   return s -- don't change an original string
   end) -- end of function and gsub
-- print the results to the terminal
for letter,count in pairs(chars) do
   print(letter, count)
   -- comment out the following line, if you don't have that font (texmf-dist/fonts/opentype/public/fandol/fandolsong-regular.otf) and check out the terminal
   tex.print("{\\setmainfont{FandolSong}"..letter.."} ("..count..");")
   end   
end -- function, countme
\end{luacode*}
\def\occurs#1{\toks0{#1}\directlua{countme()}}
% This particular font doesn't contain č, ř, š etc., but it does contain CJKV.
\occurs{你怎么样? Pavel Stříž, číšník. さよなら。}
\end{document}

An example working with CJKV characters

In the terminal, we will spot a similar structure to this one:

ž   1
?   1
你   1
么   1
。   1
な   1
,   1
怎   1
a   1
さ   1
.   1
k   1
样   1
ら   1
P   1
š   1
n   1
č   1
l   1
ř   1
よ   1
    4
í   3
e   1
v   1
S   1
t   1