ansaurus

Question

Efficient algorithm for converting a character set into a nfa/dfa

Answer 1

+1 A:

Look at what regular expression libraries like Google RE2 and TRE are doing.

jilles 2010-08-22 20:03:34

I think Google RE2 does the kind of thing I need, but it's very complex... I find some interestering code at http://code.google.com/p/re2/source/browse/re2/compile.cc (starting at line 559)

youllknow 2010-08-23 16:08:42

Answer 2

+1 A:

There are a number of ways to handle it. They all boil down to treating sets of characters at a time in the data structures, instead of enumerating the entire alphabet ever at all. It's also how you make scanners for Unicode in a reasonable amount of memory.

You've many choices about how to represent and process sets of characters. I'm presently working with a solution that keeps an ordered list of boundary conditions and corresponding target states. You can process operations on these lists much faster than you could if you had to scan the entire alphabet at each juncture. In fact, it's fast enough that it runs in Python with acceptable speed.

Ian 2010-08-24 16:04:35

ansaurus

tags:

views:

answers:

Efficient algorithm for converting a character set into a nfa/dfa

related questions