A given nonterminal can have more that one attribute by making a field in the union type be a structure. For example,
typedef union { int ival; const char* strval; struct { AST ast; int num; } astval; } YYSTYPE;
says that anything with an attribute kind of astval has two attributes, called ast and num.
The Dragon Book uses explicit field names even when there is only one attribute, and I will follow that convention.
Regular expressions are used as a mathematical way to describe sets of lexemes, which leads to a simple way to create a lexical analyzer.
Context-free grammars are a mathematical tool that can reduce the work of constructing a parser from a year (using an ad-hoc approach) to a few days.
The mathematical tool for defining the semantics, or meaning, or a program is based on attribute equations.
The idea is to write equations with each production that define one or more attributes of nonterminals.
Here is our expression grammar with attribute equations that define an attribute val, the value of an expression.
Notice how similar this is to a Bison parser with actions. Bison is a particular tool. Attribute equations are the mathematical idea.
Where Bison uses $1, $2, etc., we follow the Dragon Book's convention by referring to a nonterminal by its name, and subscripting nonterminals in a production to distinguish between two or more with the same name.
Production | Semantic rule |
---|---|
E → T | E.val = T.val |
E → E1 + T | E.val = E1.val + T.val |
T → F | T.val = F.val |
T → T1 * F | T.val = T1.val * F.val |
F → n | F.val = n.val |
F → ( E ) | F.val = E.val |
Attribute equations can define abstract syntax trees just as well as anything else.
Production | Semantic rule |
---|---|
E → T | E.ast = T.ast |
E →E1 + T | E.ast = newNode('+', E1.ast, T.ast) |
T → F | T.ast = F.ast |
T → T1 * F | T.ast = newNode('*', T1.ast, F.ast) |
F → n | F.ast = newLeaf(n.val) |
F → ( E ) | F.ast = E.ast |
In postfix notation, an operator is written after its operands. For example,
Here is a semantic grammar that converts an expression into a string, in postfix notation. I assume that operator + is available, which concatenates strings.
Production | Semantic rule |
---|---|
E → T | E.str = T.str |
E → E1 + T | E.str = E1.str + " " + T.str + " +" |
T → F | T.str = F.str |
T → T1 * F | T.str = T1.str + " " + F.str + " *") |
F → n | F.str = toString(n.val) |
F → ( E ) | F.str = E.str |
Suppose that we are compiling a typed language, with the following rules.
There are two types, int and real.
There are two kinds of constants, integer and real.
If you perform an operation between an integer value and a real value, the result is a real value.
Variables are allowed. You can find out the type of variable v using getType(v). (GetType looks in a symbol table.)
Abstract syntax tree constructors include integerConstant(n), realConstant(x), identifier(s), operatorNode(op, A, B).
combineTypes(S,T) returns int if S and T are both int, and returns real otherwise.
Here is a grammar with attribute equations that define both the type and tree for an expression.
Production | Semantic rule | ||
---|---|---|---|
E → T |
|
||
E → E1 + T |
|
||
T → F |
|
||
T → T1 * F |
|
||
F → intconst |
|
||
F → realconst |
|
||
F → id |
|
||
F → ( E ) |
|