[Math] Context free grammar + functions =

formal-languages

If you start with the rules for building a context-free grammar and extend them by allowing left-hand nonterminals to be functions of one or more arguments, does that go beyond the definition of a context-free grammar? If so (I imagine it does), what class of grammars is it? Context-sensitive?

To clarify, suppose we have the following grammar:

$S \to F(a,b,c)$
$F(x,y,z) \to xyz$

The grammar above is equivalent to the following because x, y, and z are substituted:

$S \to abc$

Now suppose we leverage the function capability with the following grammar:

$S \to F(\epsilon)$
$F(x) \to x\ a\ |\ x\ a\ F(x\ b)$

($\epsilon$ means empty string)

That grammar produces:

a
aba
ababba
ababbabbba
...

A practical application of functions in a grammar is to express an indentation-formatted computer language. Trivial example:

statement(indent)
    : indent basic_statement
    | indent header ':' newline statement_list(indent whitespace)
statement_list(indent)
    : statement(indent)*

This means that a statement, given some indentation string, may either be a one-liner (e.g. print 'hello') or it may be a header (e.g. if (x == 5)) followed by ':', a newline, and one or more child statements of a greater indentation level.

Note that statement_list has to be out by itself here, as statement(indent whitespace)* could yield child lines of inconsistent indentation if the whitespace expands to something not constant. By calling statement_list once with the possibly variable whitespace, we coerce all the child statements to have the exact same indentation. Once again, functions come to the rescue.

To be more specific on the rules of the grammar system I'm talking about:

  • The left hand of each rule is either a terminal, a nonterminal, or a function declaration for zero or more arguments.
  • A function of zero arguments is equivalent to a nonterminal.
  • The arguments of a function declaration must be single variables (I used $x$, $y$, and $z$ in the examples above).
  • The arguments of a "function call" (used on the right-hand side of a rule) may contain terminals, nonterminals, and nested function calls.
  • The arguments of a function call are expanded before being passed. Thus, functions in this grammar system are no higher than first order (if I'm using the term correctly).

So, by adding "functions" to CFG, what do I get?

Best Answer

Since you can pass arbitrary nonterminals to functions, you can encode van Winjgaarden grammars (aka two-level grammars) with your formalism, which makes parsing an undecidable problem.

A caveat: I'm not sure what you mean by "arguments of a function call are expanded before being passed". If you pass a recursive nonterminal as an argument, it can be infinitely expanded.

Related Question