Implementation of SFL


Strings

You will need to use strings quite a bit in the implementation, and you will not want to worry about memory management for them. A good idea is to create a table with all of the strings that you have seen stored in it. Create such a table. Include a function intern(s) that looks up string s in the table, and returns the string stored for s. If s is not in the table, it should be inserted.


Lexical analysis

Create a lexical analyzer for SFL using Flex. Examine the forms of programs to see what the tokens are.

Create a header file that defines tokens. It can have a form similar to the following.

#define TOK_DEF		1
#define TOK_END		2
...
Just create a definition for each token.

Some tokens need attributes. Make the attribute be the lexeme. Store the attribute into a variable called yylval, of type YYSTYPE. (There is method in the madness, so do this.) You want YYSTYPE to be a string type.

Test your lexer. Write a program that just runs the lexer repeatedly, printing each token and attribute. To compile a flex program called lexer.lex, use command

  flex lexer.lex
That creates file lex.yy.c, which defines function yylex(), the lexer. So your main program will call yylex over and over. yylex() returns 0 when it encounters the end of file.

By default, yylex reads from the standard input. To get it to read from a file, include the following in your main program.

  extern FILE* yyin;
  yyin = fopen(filename, "r");
Be sure to do this before you call yylex for the first time.


Parser

Create a parser for SFL using Bison.

As an initial phase of development, make the parser just read a file and either say nothing if the program is syntactically correct, or give an error message if the program is incorrect. If there is a syntax error, the parser should report the line number where the error occurs.

Bison manages attributes of tokens automatically. It assumes that the token attribute is in variable yylval, of type YYSTYPE. (You see, there is method in the madness.)

Bison also creates its own definition of tokens. Replace your header file for tokens by the one that Bison defines. Failure to do this will cause you trouble. You can compile the bison file in yacc compatibility mode as follows.
  bison -y -v -d parser.y
That creates file y.tab.c (defining function yyparse(), the parser) and file y.tab.h (the token definitions). Look at y.tab.h to see what you are getting. Your main program will only call yyparse() once, to parse the entire program.


Syntax trees

Create a type of abstract syntax trees that describe an expression or definition. Decide on the structure of trees. Define some functions (or constructors) that create particular kinds of nodes in the tree.

Do not try to be too close to the syntax. These trees describe expressions and definitions; they are not direct descriptions of the language syntax. For example, there should be nothing in the syntax tree that corresponds to a parenthesized expression; that is entirely syntactic. You do not need anything that corresponds to a let expression, since those expressions can be translated into a more basic form. You will probably want kinds of nodes that correspond to the following.

An identifier

An identifier node holds the name of the identifier.

A constant node

This is used for character, boolean and integer constants, as well as for the empty list. It should hold the kind of constant and the constant value. This kind of node is indicated by c below.

An operator node

An operator node is used for standard binary operators. It stores an indication of the operator, and has two subtrees, representing the parameters. This kind of node is indicated by op(a,b) below, where a and b are the subtrees.

A standard function node

A standard function node stands for one of the standard functions. It has an indication of the function that it represents. This kind of node is indicated by fun(f) below, where f indicates which function.

A standard actor node

A standard actor node either is a print actor or a read actor. The print actor contains a pointer to what it should print. This kind of node is indicated by actor(f) below, where f indicates which actor.

A conditional node

This has three children: the condition, the true part and the false part. This kind of node is indicated by if(a,b,c) below.

An apply node

An apply node indicates a function application. It has two subtrees: the function and its argument. This kind of node is indicated by apply(a,b) below.

A function expression node

This kind of node stands for an expression of the form \x -> A, and is indicated by \x->a or \(a) below, where a is the body. (The second form is used after identifiers have been removed. See eliminating identifiers.

A seq node

A seq node has two subtrees. It represents an actor that runs the left-hand subtree, ignores its result, then runs the right-hand subtree, and produces the value of the right-hand subtree as its own value. This kind of node is indicated by seq(a,b) below.

A seqf node

A seqf node has two subtrees. It represents an actor that runs the left-hand subtree, getting result r. Then it treats the right subtree as a function f, evaluates f(r), and treats the result from f(r) as an actor. It runs that actor, and produces that actor's result as the result from running the seqf tree. This kind of node is indicated by seqf(a,f) below.

Write two functions that print syntax trees. One should show the details of the structure, and is used for debugging. The other writes the tree in a form that looks similar to the language syntax.


Syntax tree construction and table management

Create a table that stores, with each identifier that is defined in a program, the tree that describes its value. Modify the parser so that it constructs a syntax tree for each definition. After the definition is made, it should make an entry in the table, associating the defined identifier with the syntax tree that it names.

Note that some expressions need to be converted to large trees. For example, expression [1,2,3] should yield the same tree as 1:2:3:[]. Do not invent too many kinds of abstract syntax tree.

Test the new parser, having it read each definition in a program, then print the expression that occurs in a definition.


Eliminating names

You will find it awkward to deal with identifiers. Write a function that takes a tree and removes all identifier names from it, as follows.

  1. Each identifier that is bound by a function node (\x -> A) should be replaced by an identifier number, indicating how many function nodes need to be skipped over, on the way up the tree, before reaching the node that binds this identifier. For example, expression (\x -> \y -> x+y) becomes the same as (\ -> \ -> i1+i0). Notice that the \ nodes no longer say which identifier they bind. Variable y has been replaced by identifier 0, indicating that the next \ node above it binds this identifier. Identifier x has been replaced by identifier 1, indicating that, to reach the appropriate \, skip over one \.

    Below, we will just write \A for a function that has the identifiers converted to numbers, omitting ->.

  2. Each identifier that is in the definition table should be replaced by its value, according to the table. But give precedence to locally defined identifiers, so that they can shadow globally defined identifiers.

Do not perform this transformation when a definition is made. A definition needs to be able to refer to later definitions. This transformation is only to be made just before evaluation.


Interpretation

Write an interpreter that evaluates an expression, where the expression is described by a syntax tree. The interpreter should assume that identifiers have been converted to numbers, as explained above. The interpreter should reduce the expression to its simplest form (where no more evaluations can be done) and return that simplified form.

The rules for evaluating expressions should be fairly obvious. Use call-by-value. To evaluate expression A B, first evaluate A, and check that its result is a function. Then evaluate B. Then perform a function application by doing a substitution. Here are basic rules for computing the simplest form, also called a normal form. nf(a) is the normal form of tree a. Node kinds should be obvious. For example, op(a, b) is an operator node with subtrees a and b. perform-op indicates the result of performing operator op, and perform-fun indicates the result of performing a given function.

nf(c) = c   [c a constant]
nf(op(a,b)) = perform-op(nf(a), nf(b))   [when nf(a) and nf(b) are constants]
nf(op(a,b)) = op(nf(a), nf(b))   [when nf(a) and nf(b) are not both contants]
nf(fun(f)) = fun(f) [fun(f) a built-in function]
nf(actor(f)) = actor(f)
nf(in) = in   [in a numbered identifier]
nf(if(a,b,c)) = nf(b) [if nf(a) = true]
  = nf(c) [if nf(a) = false]
  = error [otherwise]
nf(apply(a, b)) = subst(nf(a), nf(b))  [when nf(a) is a \ node]
nf(apply(a, b)) = perform-fun(nf(a), nf(b))  [when nf(a) is a built-in function]
nf(apply(a, b)) = apply(nf(a), nf(b))  [when nf(a) is neither a \ node nor a standard function]
nf(\ a) = \ (nf(a))
nf(seq(a,b)) = seq (a,b)
nf(seqf(a,f)) = seq(a,f)
 
subst(\ a, r) = subst1(a, r, 0)
subst1(c, r, n) = c [c a constant]
subst1(op(a,b), r, n) = op(subst1(a,r,n), subst1(b,r,n))
subst1(fun(f), r, n) = fun(f)
subst1(actor(f), r, n) = actor(f)
subst1(in, r, n) = r
subst1(in, r, k) = in [k =/= n]
subst1(if(a,b,c), r, n) = if(subst1(a,r,n), subst1(b,r,n), subst1(c,r,n))
subst1(apply(a, b), r, n) = apply(subst1(a,r,n), subst1(b,r,n))
subst1(\ a, r, n) = \(subst1(a, r, n+1)
subst1(seq(a,b), r, n) = seq(subst1(a,r,n), subst1(b,r,n))
subst1(seqf(a,f), r, n) = seqf(subst1(a,r,n), subst1(b,r,n))

You will also need to be able to run an actor.

run(actor(f))

Perform the action indicated by f, and return its value.

run(seq(a,b))

First run(nf(a)), and ignore the result. Then run(nf(b)), and return the same result that it does.

run(seqf(a,f)

Let r = run(nf(a)). Then let b = nf(apply(f,r)), and then run(b), and return its result.

Add the interpreter to your implementation. Look up the value of main, replace identifiers in it, run the interpreter on it, and perform it. If the value is an actor, seq or seqf node, then run it, as indicated above. If it is any other kind of node, then print it.

Test your program. Try some things, and see if it works.