ansaurus

Question

How do I implement a two-pass scanner using GNU Flex?

Answer 1

+2 A:

PHP doesn't differentiate between the scanning and the Markup. It simply outputs to buffer when in Markup mode, and then switches to parsing when in code mode. You don't need a two pass scanner, and you can do this with just a single flex lexer.

If you are interested in how PHP itself works, download the source (try the PHP4 source it is a lot easier to understand). What you want to look at is in the Zend Directory, zend_language_scanner.l.

Having written something similar myself, I would really recommend rethinking going the Flex and Bison route, and go with something modern like Antlr. It is a lot easier, easier to understand (the macros employed in a lex grammar get very confusing and hard to read) and it has a built in debugger (AntlrWorks) so you don't have to spend hours looking at 3 Meg debug files. It also supports many languages (Java, c#, C, Python, Actionscript) and has an excellent book and a very good website that should be able to get you up and running in no time.

Kris Erickson 2008-09-19 20:06:15

Answer 2

+4 A:

You want to look at start conditions. For example:

"<?"            { BEGIN (PHP); }
<PHP>[a-zA-Z]*  { return PHP_TOKEN; }
<PHP>">?"       { BEGIN (0); }
[a-zA-Z]*       { return HTML_TOKEN; }

You start off in state 0, use the BEGIN macro to change states. To match a RE only while in a particular state, prefix the RE with the state name surrounded by angle-brackets.

In the example above, "PHP" is state. "PHP_TOKEN" and "HTML_TOKEN" are _%token_s defined by your yacc file.

eduffy 2008-09-21 15:23:08

ansaurus

tags:

views:

answers:

How do I implement a two-pass scanner using GNU Flex?

related questions