7.2. Top-Down Parsers

A top-down parser builds the parse tree from the top down, starting with the start nonterminal.

As an example, suppose we want to build a parse tree for expression n + n * n using the following grammar.

E T R
R ε
R + E
T F S
S ε
S * T
F n
F ( E )

The parser starts by building the top node of the tree from the start nonterminal, E.

          E

There is only one production with E on its left-hand side, so that must be the one to use.

          E
         / \
        T   R         

Since the parser reads the sequence of tokens from left to right, it builds the tree from left to right too. It is creating a leftmost derivation.

The next step is to expand a little more of the tree beneath the leftmost nonterminal, T. Since there is only one production with T on the left-hand side, we use it.

          E
         / \
        T   R 
       / \
      F   S

There are two productions that have F on the left-hand side, and the parser needs to decide which to use.

Our parsers will make decisions based on a single lookahead token. The lookahead is the next token that has not yet been put into the tree. For expression n + n * n, the initial lookahead is n.

Production F → ( E ) will not work, because the lookahead is not a left parenthesis. We use Fn.

          E
         / \
        T   R 
       / \
      F   S
      |
      n

Now the leftmost nonterminal is S and the lookahead is +. Production S → * T will not work because * does not match +. The only other possibility is S → ε.

          E
         / \
        T   R 
       / \
      F   S
      |   |
      n   ε

Now we need to expand the subtree labeled R. It is a little less obvious here that production R → ε will not work, but the only one that has a chance of matching the lookahead, +, is R → + E.

           E
          / \
         /   \
        T     R 
       / \   / \
      F   S  +  E
      |   |
      n   ε

The parser continues this process. Eventually, it will have built the entire parse tree.

           E
          / \
         /   \
        T     R 
       / \   / \
      F   S  +  E
      |   |    / \
      n   ε   /   \
             T     R
            / \    |
           F   S   ε
           |  / \
           n  *  T
                / \
               F   S
               |   |
               n   ε

Predictive nature of top-down parsers

Top-down parsers are also called predictive parsers. To see why, look at the process above where the parser is about to expand under R in this parse tree.

          E
         / \
        T   R 
       / \
      F   S
      |   |
      n   ε

Based on the lookahead token +, the parser makes a prediction that + E is coming.

           E
          / \
         /   \
        T     R 
       / \   / \
      F   S  +  E
      |   |
      n   ε

It is the predictive nature of top-down parsers that forced a modification of the expression grammar from a previous page. That grammar has productions

E T
E T + E

Knowing that the lookahead is n is not enough information to know which production to use, so the productions for E needed to be converted to a different form so that the parser's prediction is made later, when the required information is available. The common T on the left has been left-factored out.

E T + R
R ε
R + E

Left- and right-recursion

The expression grammar that we have used also uses right-recursion. Top-down parsers cannot handle left recursion, except by mechanisms outside the normal parsing rules.