11.3. Attribute Equations


Multiple attributes

A given nonterminal can have more that one attribute by making a field in the union type be a structure. For example,

  typedef union
  {
    int         ival;
    const char* strval;
    struct 
    {
      AST ast;
      int num;
    }           astval;
  } YYSTYPE;

says that anything with an attribute kind of astval has two attributes, called ast and num.

The Dragon Book uses explicit field names even when there is only one attribute, and I will follow that convention.


Writing equations

Regular expressions are used as a mathematical way to describe sets of lexemes, which leads to a simple way to create a lexical analyzer.

Context-free grammars are a mathematical tool that can reduce the work of constructing a parser from a year (using an ad-hoc approach) to a few days.

The mathematical tool for defining the semantics, or meaning, or a program is based on attribute equations.

The idea is to write equations with each production that define one or more attributes of nonterminals.


Example

Here is our expression grammar with attribute equations that define an attribute val, the value of an expression.

Notice how similar this is to a Bison parser with actions. Bison is a particular tool. Attribute equations are the mathematical idea.

Where Bison uses $1, $2, etc., we follow the Dragon Book's convention by referring to a nonterminal by its name, and subscripting nonterminals in a production to distinguish between two or more with the same name.

Production Semantic rule
ET E.val = T.val
EE1 + T E.val = E1.val + T.val
TF T.val = F.val
TT1 * F T.val = T1.val * F.val
Fn F.val = n.val
F → ( E ) F.val = E.val

Example: Building abstract syntax trees

Attribute equations can define abstract syntax trees just as well as anything else.

Production Semantic rule
ET E.ast = T.ast
EE1 + T E.ast = newNode('+', E1.ast, T.ast)
TF T.ast = F.ast
TT1 * F T.ast = newNode('*', T1.ast, F.ast)
Fn F.ast = newLeaf(n.val)
F → ( E ) F.ast = E.ast

Example: Conversion to postfix

In postfix notation, an operator is written after its operands. For example,

  1. 3 2 + has value 5.
  2. 3 2 + 9 8 + * is a postfix expression equivalent of (3+2)*(9+8).

Here is a semantic grammar that converts an expression into a string, in postfix notation. I assume that operator + is available, which concatenates strings.

Production Semantic rule
ET E.str = T.str
EE1 + T E.str = E1.str + " " + T.str + " +"
TF T.str = F.str
TT1 * F T.str = T1.str + " " + F.str + " *")
Fn F.str = toString(n.val)
F → ( E ) F.str = E.str

Computing types and trees

Suppose that we are compiling a typed language, with the following rules.

  1. There are two types, int and real.

  2. There are two kinds of constants, integer and real.

  3. If you perform an operation between an integer value and a real value, the result is a real value.

  4. Variables are allowed. You can find out the type of variable v using getType(v). (GetType looks in a symbol table.)

  5. Abstract syntax tree constructors include integerConstant(n), realConstant(x), identifier(s), operatorNode(op, A, B).

  6. combineTypes(S,T) returns int if S and T are both int, and returns real otherwise.

Here is a grammar with attribute equations that define both the type and tree for an expression.

Production Semantic rule
ET
E.ast = T.ast
E.type = T.type
EE1 + T
E.ast = newNode('+', E1.ast, T.ast)
E.type = combineTypes(E1.type, T.type)
TF
T.ast = F.ast
T.type = F.type
TT1 * F
T.ast = newNode('*', T1.ast, F.ast)
T.type = combineTypes(T1.type, F.type)
Fintconst
F.ast = integerConstant(intconst.ival)
F.type = int
Frealconst
F.ast = realConstant(realconst.rval)
F.type = real
Fid
F.ast = identifier(id.name)
F.type = getType(id.name)
F → ( E )
F.ast = E.ast
F.type = E.type