Syntax – How is TeX Syntactically Organized? Understanding Its Structure

definitionresourcessyntax

I'm just starting out on TeX, and I wanted to find a directory or table of how TeX is arranged in syntax. I can make educated guesses, but I figured I'd be better self taught if I understood how commands work instead of simply copying them from the internet if they format in the way I like. It's difficult to describe, but for your understanding, I am looking for something along the lines of:

  • \\ means that you go a new line

  • * means that if your command has a label, that label is removed

  • function[]{} allows you to perform a specific function in {}, taking an argument in []

  • begin{} is paired with end{} and allows you to format things between them such that the command you put into something formats the text within it

I'm aware that the last is probably a subgenre of the definition above it, but it seemed commonplace enough to have its own section. I'm also aware that all of above is likely incorrect. Like I said, I'm looking for a table of definitions. These are based off educated guesses.

Best Answer

As mentioned in the comments, TeX's syntax is a bit—complicated. What's more, given the ability to modify category codes, specify templates for argument parsing and define active characters it's potentially completely open-ended.

So, I'm going to focus on the subset of TeX's syntax which is defined in LaTeX:

Commands begin with \. A command can either be a named command, consisting of one or more letters¹ or a command symbol which is a \ followed by a non-letter. Spaces after a named command are ignored.

Commands can take arguments. Normally, the number of arguments will be less than ten, but it's possible through some macro trickery to consume more.

LaTeX has three types of arguments:

  • Required Arguments as the name implies are required. They are normally enclosed in {}, although for an argument which is a single character or command² the braces can be omitted.
  • Optional arguments are, not surprisingly, optional. These are enclosed in [] and carry different semantics based on both the command and sometimes where on the command they appear.³ Common cases include \sqrt which uses its optional argument to put an index on the surd and the sectioning commands which use the optional argument to supply alternate text for the table of contents or section heading as appropriate.
  • Coordinates are pairs of numbers enclosed in () separated by a comma. These appear primarily in LaTeX's picture mode which is not commonly used these days having been superseded by TikZ.⁴

In addition, some commands take a * form which calls for slightly different execution of the command. For example the sectioning commands, when called with the * form will print a section heading without a number and will not put their arguments into the table of contents or affect page headers if the sectioning command normally would have. In contrast, \verb* will print ␣ in place of any spaces when producing verbatim output.⁵

There are also environments which are delineated by \begin{…}\end{…}. Environment names are able to include nearly any characters except \ and unmatched braces.⁶ For * forms of environments, the * is put inside the braces and is effectively part of the environment name. Some packages will use ad hoc conventions to indicate alternate behaviors for environments, e.g., the tabularray environment defines versions of the amsmath environments through its amsmath library which are prefixed with + that use tabularray to produce their output.

Environments can take arguments just like commands can, including all of the types above.

There are a few special characters to be aware of as well: ~ is a no-break space. _ and ^ act as commands with a single required argument to produce subscripts and superscripts respectively, but only in math mode.


  1. In pdfTeX a letter is defined as the 52 characters which are the upper and lower case letters. Additional ASCII characters can be designated as letters by modifying their category codes which is how the internal LaTeX commands with @ in their names work. In XeTeX and LuaLaTeX, which are both Unicode-aware, any Unicode letter is allowed in a named command.

  2. Note that if you're giving a command as an argument without braces that it's only the command that would be consumed as the argument and not its arguments, so if you want to produce the square root of a fraction, you will need to write, e.g., \sqrt{\frac12} rather than \sqrt\frac12.

  3. The most notable of these is \newtheorem.

  4. TikZ's syntax is its own thing and beyond the scope of this answer.

  5. Verbatim arguments are another edge case. Base LaTeX has one command that takes verbatim arguments, \verb, and delimits them with two matching non-space characters. A modified version of those semantics is available by specifying an argument with a type of v with \NewDocumentCommand.

  6. This freedom is somewhat unintentional I think and is largely a result of the low-level mechanism by which LaTeX interprets environment names.

Related Question