[Tex/LaTex] Why can’t “fi” be separated when being copied from a compiled pdf


I notice that in a pdf file compiled from Latex, "fi" such as in "field" cannot be separated as "f" and "i" when copying text out of the pdf file. I wonder why and if this can be changed? Thanks and regards!

Best Answer

The following is taken "verbatim" from the TeX Book (Chapter 9 TeX's Roman Fonts, p 51):

Let's begin with the rules for the normal roman font (\rm or \tenrm); plain TeX will use this font for everything unless you specify otherwise. Most of the ordinary symbols that you need are readily available and you can type them in the ordinary way: There's nothing special about

  • the letters A to Z and a to z
  • the digits 0 to 9
  • common punctuation marks : ; ! ? ( ) [ ] - * / . , @

except that TeX recognizes certain combinations as ligatures

  • ff yields
  • fi yields
  • fl yields
  • ffi yields
  • ffl yields
  • -- yields (an en-dash)
  • --- yields (an em-dash)
  • ‘‘ yields
  • ’’ yields
  • !‘ yields ¡
  • ?‘ yields ¿

Of course, TeX writes ligatures for most of its accents as well, as in \^o. The best way to think about ligatures is that they represent a single character in a font. As such, MS Word's "Insert Symbol" dialog is probably a good representation of this:

enter image description here

Note how some of the symbols occur in a single box, implying that are "joined at the hip" so to speak, representing a single character (or ligature) in the typeset output. Additionally, this is font specific, with different fonts having different (more or less) ligatures.