← | ↑ | → |
Create files src/lexer.lex and src/lexer.h. The lexer will create a function yylex. It also creates variables yyin and yylval. Put the following into lexer.h.
#include <stdio.h> int yylex(); extern FILE* yyin; extern YYSTYPE yylval;
Now look at the language description.
Identify all of the reserved words. Put them into lexer.lex, one per line.
Identify all of the symbols, such as parentheses. Add them to lexer.lex, one per line.
Look for other kinds of lexemes that a program can contain, including identifiers. Add them to lexer.lex.
Now you have a list of most of the lexemes of SFL. Decide on a token name for each kind.
For lexemes that are one character, such as '(', you can use the character itself as the token.
For more complicated lexemes, you will need to choose a name. For example, lexeme -> will need to have a name, such as TOK_ARROW.
You will want a token for each reserved word. But some reserved words, such as head and isList, enter the language in similar ways. Create one token that represents all of them, and use the token attribute to indicate which one it is. (You will find an integer attribute convenient. But give the attributes a name, such as HEAD_ATT. Do not use raw integers as attributes.)
Operators + and − have the same precedence and associativity. Have one token that represents both of them, and use the attribute to distinguish which it is.
You will need a token for each level of precedence.
Copy the list of lexemes into another file called src/token.h. Don't copy single-character lexemes, such as '('.
Replace each lexeme in token.h by a definition of its token as an integer greater than or equal to 256. For example, you might write
#define TOK_ARROW 256 #define TOK_DEF 257The numbers that you choose are not important, but be sure to choose a different number for each token, and keep the token numbers reasonably small.
Include a definition of type YYSTYPE in token.h, as follows.
#define YYSTYPE_DEFINED typedef union { char* str; int ival; } YYSTYPE;YYSTYPE is the type of a token attribute. This assumes that a token attribute is either a string or an integer.
Now return to file lexer.lex. Change it into a lexical analyzer using Flex. See the short description of Flex or this Flex manual or this Flex manual. The lexer returns a token number. Here is a start.
%{ #include "lexer.h" #include "token.h" #include "stringtable.h" YYSTYPE yylval; %} letter [a-zA-Z] digit [0-9] id {letter}+ %% [ \t] {} "\n" {line_number++; } def {return TOK_DEF; } "(" {return '('; } "->" {return TOK_ARROW; } {id} {yylval.str = intern(yytext); return TOK_ID; } %% /*=========================================* * yylex calls yywrap() when it encounters * the end of the file. Returned value 1 * tells yylex t0 return 0, which is yylex's * way of saying 'end of file'. *=========================================*/ int yywrap() { return 1; }
You will need to handle all of the lexemes and tokens. You will also need to include handling of spaces and comments. The sample lexer contains cases to handle spaces, tabs and newlines. Create a case to handle comments by doing nothing. (Doing nothing implicitly restarts yylex to try again.)
To compile your flex lexer, use command
flex lexer.lexFlex will create a file called lex.yy.c containing the lexer function, called yylex, written in C. Each time you call yylex( ), it returns the next token. yylex( ) returns 0 at the end of the file.
Create a file src/testlexer.c. Here is a suggestion for it.
#include "lexer.h" /*======================================* * This main function tests the lexer * function, yylex(). It assumes that * the command line contains the name of * a file to read. *======================================*/ int main(int argc, char** argv) { if(argc == 2) { /*----------------------------------* * Flex reads from open file yyin. * *----------------------------------*/ yyin = fopen(argv[1], "r"); if(yyin == NULL) return 1; /*-----------------------------* * yylex() reads a lexeme and * * returns a token. * *-----------------------------*/ while(1) { int tok = yylex(); if(tok == 0) { break; } else if(tok == TOK_ID) { printf("ID, name = \"%s\"\n", yylval.str); } else { printf("Token %d (\"%s\")\n", tok, yytext); } } return 0; } }
Here is a modified Makefile that includes the lexer. To build the lexer and the tester, use command
make bin/testlexerIn the future, you will need to modify the Makefile on your own.
CC = gcc LINK = gcc CFLAGS = -Wall RM = rm CFILES = src/stringtable.c\ src/teststringtable.c\ src/lex.yy.c\ src/testlexer.c OFILES = $(patsubst src/%.c,obj/%.o,$(CFILES)) EFILES = bin/teststringtable\ bin/testlexer STRINGTABLE_OFILES = obj/stringtable.o\ obj/teststringtable.o TESTLEXER_OFILES = obj/stringtable.o\ obj/lex.yy.o\ obj/testlexer.o bin/testlexer: $(TESTLEXER_OFILES) $(LINK) -o bin/testlexer $(TESTLEXER_OFILES) bin/teststringtable: $(STRINGTABLE_OFILES) $(LINK) -o bin/teststringtable $(STRINGTABLE_OFILES) src/lex.yy.c: src/lexer.lex cd src; flex lexer.lex obj/%.o: src/%.c $(CC) $(CFLAGS) -c $< -o $@ clean: $(RM) $(OFILES) $(EFILES) |
Now test the lexer. Write a small test SFL program. Include some comments. Run the lexer on it. Does it work?
When you have the lexer working, submit it using the following command, if your lexer is called lexer.lex.
cd src ~abrahamsonk/5220/bin/submit lexer lexer.lex lexer.h token.h allocate.h stringtable.h stringtable.c