views:

853

answers:

5

I want to build a lexer in C and I am following the dragon book, I can understand the state transitions but how to implement them?

Is there a better book?

The fact that I have to parse a string through a number of states so that I can tell whether the string is acceptable or not!

+3  A: 

You can implement simple state transitions with a single state variable, for example if you want to cycle through the states start->part1->part2->end then you can use an enum to keep track of the current state and use a switch statement for the code you want to run in each state.

enum state { start=1, part1, part2, end} mystate;

// ...
mystate = start;
do {
  switch (mystate) {
    case start:
      // ...
    case part1:
      // ...
    case part2:
      // ...
      if (part2_end_condition) mystate = end; // state++ will also work
      // Note you could also set the state back to part1 on some condition here
      // which creates a loop
      break;
  }
} while (mystate != end);

For more complex state transitions that depend on several variables, you should use tables/arrays like this:

var1    var2    var_end    next_state
0       0       0          state1
0       1       0          state2
1       0       0          state3
1       1       0          state4
-1      -1      1          state_end // -1 represents "doesn't matter" here
schnaader
are state and mystate different variables?
Tahir Akhtar
Sorry, that was a typo. state is the name of the enum type, mystate is the only variable used here.
schnaader
+2  A: 

G'day,

Assuming you mean The Dragon book on compiler design, I'd recommend having a look around this page on compiler tools.

The page itself is quite small but has links through to various excellent resources on lexical analysers.

HTH

cheers,

Rob Wells
A: 

If you're looking for a more modern treatment than the dragon book(s) : Andrew W. Appel and Maia Ginsburg, Modern Compiler Implementation in C, Cambridge University Press, 2008.

Chapter 2 is focused on Lexical Analysis : Lexical tokens, Regular expressions, Finite automata; Nondeterministic Finite Automata; Lexical analyzer generators

Look at the Table of Contents

anno
A: 

There's more than one way to do it. Every regular expression corresponds directly to a simple structured program. For example, an expression for numbers could be this:

// regular expression
digit* [.digit*]

and the corresponding C code would be:

// corresponding code
while(DIGIT(*pc)) pc++;
if (*pc=='.'){
    pc++;
    while(DIGIT(*pc)) pc++;
}

The transition-table way of building lexers is, in my opinion, needlessly complicated, and obviously runs slower.

Mike Dunlavey
Any particular reason why you use *pc then pc[0] then *pc again?
John Machin
Mike Dunlavey
A: 

The program flex (a clone of lex) will create a lexer for you.

Given an input file with the lexer rules, it will produce a C file with an implementation of a lexer for those rules.

You can thus check the output of flex for how to write a lexer in C. That is, if you don't just want to use flex's lexer...

Daren Thomas
Flex is BSD, silly.
Chris Lutz
Also, Bison has a disclaimer that says that Bison-generated code can be used in non-GPL code.
Chris Lutz
updated (removed comments about GPL, my bad, sorry). Please don't call people silly. It was a little offended, but only at first. Bison did have issues with the generated code, glad to see they have added a disclaimer.
Daren Thomas