using JFlex instead of Regex

JFlex is not a parser generator, but a scanner generator. It tokenizes the input. Use it in combination with a parser generator like CUP or BYACC/J.

There's an important difference among scanner and parser:

A scanner can recognise a Regular Language, whereas
A parser can recognise a Context-Free Language.

Your questions:

1) and 2) Suppose that you have to convert an input stream of characters into a stream of tokens, given the following patterns:

if the input matches [0-9]+ (something different than \. follows) then it's an unsigned integral. Send "INTEGER" to the output.
if the input matches [0.9]+\.[0-9]* then it's an unsigned floating point. Send "FLOAT" to the output.

Note that they share a common prefix. If you want to scan the input with regular expressions, you'll have to split them in their common prefix (unless you want it to be very slow since regexes are expensive). At runtime, you'll have to first evaluate the prefix, if matches, then evaluate what follows, if ^\., you have an integral and will start over again, if \. you will have to evaluate if the following text is the mantissa of a floating point number. If so, you have a FLOAT.

Basically what you have to build is a finite state automaton in which states are points of decision and reflect the input seen so far, and transitions are evaluations on the current character seen in the input.

JFlex (as many other scanner generators) will allow you to generate the code for such automata automatically, by only providing regexes (basically). And will generate very efficient code for it.

3) You can use a generated scanner and a generated parser in tandem to recognise any context-free languages. Such as programming languages. Although it should be possible to parse XML with it (I've never tried), specific-purpose parsers are usually used for XML (such as SAX, StAX, etc, etc) since XML has a well-known structure and then no need to generate a parser .

BTW, please bear in mind that you cannot parse XML with Regex. ;)

Regards.

ansaurus

tags:

views:

answers:

using JFlex instead of Regex

related questions