[Tex/LaTex] How to improve machine-readability of a CV created in LaTeX with moderncv

moderncvtext-decorationsxpatch

Some background story. I recently send my CV to a free online evaluation. The reply showed that although the document looked visually appealing, it was bad for ATS (Applicant Tracking System). These software packages match the content of the CV with the job offer and based on that filter most applicants. It turns out the PDF output of LaTeX is quite terrible for this. The evaluation recommended submitting resumes in .doc Word files.

After some research, I managed to fix most problems (ligatures, encoding, etc.). I'm basically using this format for my CV:

EDITED: Added some extra commands and partial solutions

\documentclass[10pt,letterpaper,sans]{moderncv}

%% ModernCV themes
\moderncvstyle{classic}
\moderncvcolor{black}
\moderncvicons{awesome}

%% Character encoding
\usepackage[utf8]{inputenc}
\usepackage[T1]{fontenc}
\usepackage{lmodern}

%% Improve text-only output
\usepackage{xpatch}
\input{glyphtounicode}
\pdfgentounicode=1
\def\labelitemi{--} % Bullet list with a dash
% EDIT: modify cventry to add invisible colons (:) between year and content
\xpatchcmd{\cventry}{#2}{#2{\makebox[0pt]{\transparent{0}:}}}
{}{\typeout{===>Failure in patching \\cventry}}
% EDIT: redefined \social to add some description with transparency
\RenewDocumentCommand{\social}{O{}O{}m}{%
  \ifthenelse{\equal{#2}{}}%
    {%
      \ifthenelse{\equal{#1}{linkedin}}{\collectionadd[linkedin]{socials}{%
\protect\makebox[0pt]{\protect\texttransparent{0}{www.linkedin.com/in/}}
\protect\httplink[#3]{www.linkedin.com/in/#3}}}{}%
      \ifthenelse{\equal{#1}{twitter}} {\collectionadd[twitter]{socials} {%
\protect\makebox[0pt]{\protect\texttransparent{0}{www.twitter.com/}}
\protect\httplink[#3]{www.twitter.com/#3}}}    {}%
      \ifthenelse{\equal{#1}{github}}  {\collectionadd[github]{socials}  {%
\protect\makebox[0pt]{\protect\texttransparent{0}{www.github.com/}}
\protect\httplink[#3]{www.github.com/#3}}}     {}%
    }
    {\collectionadd[#1]{socials}{\protect\httplink[#3]{#2}}}}
%EDIT: Change the Linkedin symbol
\renewcommand*{\linkedinsocialsymbol}{{\small\faLinkedinSquare}~}

%% Adjust the page margins
\usepackage[margin=1.75cm]{geometry}

%% Personal data
\firstname{Mickey}
\familyname{Mouse}
\phone{+1~(555)~123~4567}
\email{mickey@disney.com}
\social[twitter]{mickeymouse}

\begin{document}

% CURRICULUM VITAE
\newpage
\makecvtitle
\onehalfspacing

\section{\texorpdfstring{\faStar~Professional Summary}{Professional Summary}}
\cvlistitem{I'm a talking mouse, please hire me.}

\section{\texorpdfstring{\faIndustry~Experience}{Experience}}
\cventry{1940-present}{Cartoon character}{Walt Disney Company}{Animation Division}{}{}
\cvlistitem{Many films, please hire me.}
\cventry{1950-present}{Company Mascot}{Walt Disney}{Worldwide}{}{I Hate the Pixar Lamp}

\section{\texorpdfstring{\faGraduationCap~Education}{Education}}
\cvlistitem{I can talk, please hire me.}

\section{\texorpdfstring{\faWrench~Skills}{Skills}}
\cvlistitem{I can talk, please hire me.}

\section{\texorpdfstring{\faTrophy~Awards}{Awards}}
\cvlistitem{Many Oscars, please hire me.}

\end{document}

Sorry for the long and cheesy example. I use the \texorpdfstring bit to avoid putting the symbol before the section name into the PDF bookmarks. However, these symbols are text chars, and parsed into the pdf text when I use pdftotext and also into the ATS software (which I asume uses something similar pdftotext). So my goal for now is to improve the output of pdftotext, so it is fully readable, and with as much "format" as possible (basically paragraph spacing between sections).

The lines \input{glyphtounicode} and \pdfgentounicode=1 removed some glyphs from the output, but not all of them, and I'm still getting incorrect symbols in front of the phone number, the email and the linkedin URL. \def\labelitemi{--} fixed the problem with the standard bullet item from moderncv (comment this line, compile the document and pass the pdf through pdftotext to see what I'm talking about). As you can see, ditching the extra symbols I put into the section titles won't completely solve my issue.

What I would like to do is typeset all those moderncvicons as images, if possible, or in some other way so they are visible in the pdf, but not readable as text from pdftotext or manual copy-paste. Also, I would like to add some hidden text, not visible in the pdf but accessible to pdftotext, to label accordingly the phone number and email address, and ideally show the full URL address to the linkedin/twitter profile. I don't mind tweaking the commands of moderncv in order to achieve this, or creating new commands from scratch.

The third level of complexity (I think) would be adding extra vertical space (one extra line) between \cventry entries and some separator between the year and the rest of the content, but only in the text-only output. I would like the PDF to stay more or less the same.

Is this doable? Or should I switch to .doc Word files?

Thanks in advance for any help, advice, comment, critic, joke, etc.

BOUNTY:

I still need several stuff to consider this problem solved

  1. I need a way to hide characters from the text-only version, but still visible on the pdf. Basically, the opposite of transparent. I need to apply this still unknown method to hide the twitter symbol, the email symbol and the phone symbol, and also the symbols at the beginning of each section.

  2. I need to format properly the output of the transparent characters, because they seem to be added into a separate line (check the output of pdftotext)

Best Answer

I finally have something that more or less satisfy my requirements. I ended up using the package accsupp according to recommendations.

I also had to switch to lualatex from pdflatex for compilation of the .tex file. lualatex seems to handle better the encoding and substitution of glyphs from fontawesome, and there are no errors in the output of pdftotext. You will need to make sure the fontawesome fonts are readable by lualatex. In Linux, I created symbolic links in one on my system font directories to the .tfm and .otf files related to fontawesome within the texlive installation and updated the system font cache.

Using lualatex also meant ditching the transparent package, and instead, using accsupp everywhere.

Here's the latest version:

\documentclass[10pt,letterpaper,sans]{moderncv}

%% Adjust the page margins
\usepackage[margin=1.75cm]{geometry}
\usepackage{setspace}

%% ModernCV themes
\moderncvstyle[right]{classic}
\moderncvcolor{black}
\moderncvicons{awesome}

%% Character encoding
\usepackage[utf8]{inputenc}
\usepackage[T1]{fontenc}
\usepackage{lmodern}

%% Improve text-only output
\def\labelitemi{--} % Bullet list with a dash
\usepackage{xpatch}
\usepackage{accsupp}

% Patch cventry - Add linebreak before and separator: after
\xpatchcmd{\cventry}{#2}{%
  \protect\BeginAccSupp{%
    method=pdfstringdef=true,ActualText=\unichar{"000A}#2:}%
    #2\protect\EndAccSupp{}}%
{\typeout{===>Success in patching \\cventry}}
{\typeout{===>Failure in patching \\cventry}}

% Remove glyphs from text version and add description
\renewcommand*{\linkedinsocialsymbol}{%
\protect\BeginAccSupp{ActualText=}%
{{\small\faLinkedinSquare}~}%
\protect\EndAccSupp{}}%

\renewcommand*{\fixedphonesymbol}{%
\protect\BeginAccSupp{ActualText=Telephone:}%
{{\faPhone}~}%
\protect\EndAccSupp{}}%

\renewcommand*{\emailsymbol}{%
\protect\BeginAccSupp{ActualText=Email:}%
{{\small\faEnvelopeO}~}%
\protect\EndAccSupp{}}%

% Define mySection, which removes symbols from text version
\newcommand{\mySection}[2]{%
\BeginAccSupp{method=pdfstringdef,ActualText=\unichar{'000A}#2:}%
    \section{\texorpdfstring{#1~#2}{#2}}%
\EndAccSupp{}%
}

\newcommand{\mycvitem}[1]{%
  \BeginAccSupp{method=pdfstringdef,ActualText={\unichar{"000A}#1:}}%
    \cvitem{}{\textbf{#1}}%
  \EndAccSupp{}}

%Redefine socials to add full link into text-version
\RenewDocumentCommand{\social}{O{}O{}m}{%
\ifthenelse{\equal{#2}{}}{%
  \ifthenelse{\equal{#1}{linkedin}}{\collectionadd[linkedin]{socials}%
    {\protect\BeginAccSupp{method=pdfstringdef,
    ActualText={\protect\unichar{"000A}http://www.linkedin.com/in/#3%
    \protect\unichar{"000A}}}%
    \protect\httplink[#3]{www.linkedin.com/in/#3}\protect\EndAccSupp{}}}{}%
  \ifthenelse{\equal{#1}{twitter}} {\collectionadd[twitter]{socials}%
    {\protect\BeginAccSupp{method=pdfstringdef,
    ActualText=\protect\unichar{"000A}www.twitter.com/#3%
    \protect\unichar{"000A}}}%
    \protect\httplink[#3]{www.twitter.com/#3}\protect\EndAccSupp}}{}%
  \ifthenelse{\equal{#1}{github}}  {\collectionadd[github]{socials}%
    {\protect\BeginAccSupp{method=pdfstingdef,
    ActualText=\protect\unichar{"000A}www.github.com/#3%
    \protect\unichar{"000A}}%
    \protect\httplink[#3]{www.github.com/#3}\protect\EndAccSupp}}{}%
}
{\collectionadd[#1]{socials}{\protect\httplink[#3]{#2}}}}

%% Personal data
\firstname{John}
\familyname{Doe}
\phone{+1~(555)~123~4567}
\email{mickey@disney.com}
\social[linkedin]{mickeymouse}

\begin{document}

% CURRICULUM VITAE
\newpage
\makecvtitle
\onehalfspacing

\mySection{\faStar}{Professional Summary}
\cvlistitem{I'm a talking mouse, please hire me.}

\mySection{\faIndustry}{Experience}
\cventry{1940-present}{Cartoon character}{Walt Disney Company}{Animation Division}{}{}
\cvlistitem{Many films, please hire me.}
\cventry{1950-present}{Company Mascot}{Walt Disney}{Worldwide}{}{I Hate the Pixar Lamp}

\mySection{\faGraduationCap}{Education}
\cvlistitem{I can talk, please hire me.}

\mySection{\faWrench}{Skills}
\mycvitem{Languages}
\cvlistitem{I can talk, please hire me.}

\mySection{\faTrophy}{Awards}
\cvlistitem{Many Oscars, please hire me.}

\end{document}

It's messy. I know. But it works (at least on my side). Patching the actual cvitem might produce undesired results because this command is used within the definition of cvlistitem. That's why I defined mycvitem. Hope this helps at least someone.

As always, thanks to the community