[Math] Why is this Parsing Expression Grammar left recursive

computer sciencecontext-free-grammarformal-grammar

I'm trying to get my parser generator to accept this specification. I know that's kind of a programming question, but I figured this was the best place to ask. It's specified as a Parsing Expression Grammar, I think.

start   = Expr

Expr    = [0-9]+ / Sum / Product / '(' Expr ')'
Product = Expr '*' Expr
Sum     = Expr '+' Expr

Note: I know nothing about formal grammars beyond what I've read on Wikipedia. If there are any good online resources, I'd love to hear about it.

In any case, my generator is throwing this error:

Left recursion detected for rule "Expr".

What I find strange is that it's perfectly willing to accept this:

start   = Expr

Expr    = [0-9]+ / '(' Expr ')'

Which looks just as recursive as the first one to me.

Best Answer

A "left-recursive" grammar means that the parser can get into a loop in the parsing rules without making any progress consuming the input. Imagine that each production is a subroutine that might eat some tokens or call some other subroutines. For example, consider this production:

expr → number '+' number

This means that the expr parser will first call the number parser (which might consume some tokens), then try to eat a + token (and fail if the next token is not of that type) and then call the number parser again to consume more tokens. If that is all successful, the expr parser will yield success.

Now consider this left-recursive production:

expr → expr '+' expr

The first thing the expr parser will do is to call itself recursively, putting itself into an unproductive infinite loop.

In contrast, consider this production, which is not left-recursive:

expr → '(' expr ')'

The first thing the parser will do is try to consume an open parenthesis token, and it will call itself recursively only if that is successful. It can never get into an infinite loop doing this because there are only so many parentheses and it has to eat one each time before it can recurse; eventually it must run out of parentheses and then the recursion will stop. Your Expr = [0-9]+ / '(' Expr ')' example is similarly safe.

Here is a portion of the left-recursive grammar you showed:

expr → [0-9]+ | sum
sum  → expr '*' expr

If the next token is not [0-9]+, then the expr parser will call sum, which will immediately call expr again, and the parser will be in an infinite loop.

Usually you can rewrite rules like sum in a non-left-recursive form. In this case it will look something like this:

expr       → term more_terms
more_terms → '*' term more_terms | '+' term more_terms | <nothing>
term       → [0-9]+ | '(' expr ')'

This is not left-recursive because the more_terms parser must consume a * or a + token before it can call itself recursively; it is impossible for the parser to get into an infinite loop by doing this. Similarly the expr parser calls term, but term will not call back to expr without consuming a parenthesis.

In general the transformation says to turn a left-recursive grammar like this:

S → S 'p' | 'q'

into a non-left-recursive one like this:

S  → 'q' ps 
ps → 'p' ps | <nothing>

I don't know if there are any good online resources (I suppose there must be) but if you like Perl, chapter 8 of Higher-Order Perl is available online and discusses precisely this point.

Also, I hope it is not insulting to suggest that you read the Wikipedia article about left recursion, and perhaps the article about recursive descent parsing, which is the name for the technique your parser generator is trying to use.

Related Question