A top-down parser builds the parse tree from the top down, starting with the start nonterminal.
As an example, suppose we want to build a parse tree for expression n + n * n using the following grammar.
E | → | T R |
R | → | ε |
R | → | + E |
T | → | F S |
S | → | ε |
S | → | * T |
F | → | n |
F | → | ( E ) |
The parser starts by building the top node of the tree from the start nonterminal, E.
E
There is only one production with E on its left-hand side, so that must be the one to use.
E / \ T R
Since the parser reads the sequence of tokens from left to right, it builds the tree from left to right too. It is creating a leftmost derivation.
The next step is to expand a little more of the tree beneath the leftmost nonterminal, T. Since there is only one production with T on the left-hand side, we use it.
E / \ T R / \ F S
There are two productions that have F on the left-hand side, and the parser needs to decide which to use.
Our parsers will make decisions based on a single lookahead token. The lookahead is the next token that has not yet been put into the tree. For expression n + n * n, the initial lookahead is n.
Production F → ( E ) will not work, because the lookahead is not a left parenthesis. We use F → n.
E / \ T R / \ F S | n
Now the leftmost nonterminal is S and the lookahead is +. Production S → * T will not work because * does not match +. The only other possibility is S → ε.
E / \ T R / \ F S | | n ε
Now we need to expand the subtree labeled R. It is a little less obvious here that production R → ε will not work, but the only one that has a chance of matching the lookahead, +, is R → + E.
E / \ / \ T R / \ / \ F S + E | | n ε
The parser continues this process. Eventually, it will have built the entire parse tree.
E / \ / \ T R / \ / \ F S + E | | / \ n ε / \ T R / \ | F S ε | / \ n * T / \ F S | | n ε
Top-down parsers are also called predictive parsers. To see why, look at the process above where the parser is about to expand under R in this parse tree.
E / \ T R / \ F S | | n ε
Based on the lookahead token +, the parser makes a prediction that + E is coming.
E / \ / \ T R / \ / \ F S + E | | n ε
It is the predictive nature of top-down parsers that forced a modification of the expression grammar from a previous page. That grammar has productions
E | → | T |
E | → | T + E |
Knowing that the lookahead is n is not enough information to know which production to use, so the productions for E needed to be converted to a different form so that the parser's prediction is made later, when the required information is available. The common T on the left has been left-factored out.
E | → | T + R |
R | → | ε |
R | → | + E |
The expression grammar that we have used also uses right-recursion. Top-down parsers cannot handle left recursion, except by mechanisms outside the normal parsing rules.