tags:

views:

477

answers:

5

I'm interested in building a regex engine, as a side-project, just for learning purposes.

I know the theory behind evaluation of regular expressions, and have a sufficient understanding of finite state machines etc.

What I'm interested in is how a regex engine is implemented in software. So I was wondering if there was any sort of tutorial or online resource that explains the implementation of a regex engine, translation from the regex to a FSM and so on. I don't want any sites that just explains the theory behind it.

Thanks.

+4  A: 

Russ Cox has a nice collection of articles about Implementing Regular Expressions, especially his article Regular Expression Matching Can Be Simple And Fast worth reading.

Gumbo
+3  A: 

I have 2 interesting links for regular expressions mechanisms,

Nick D
+1  A: 

If you are comfortable with C, you will probably benefit from looking at the source code for Perl-compatible regular expressions.

James M.
+4  A: 

In first chapter of Beautiful Code Brian Kernighan talks about how Rob Pike did an elegant, very small regex matcher. It's really simple, but Kernighan gives seven exercises to extend it, which might be a good intro for you.

Jim Ferrans
+3  A: 

I think the article How Regexes Work by Mark-Jason Dominus is excellent. It's targeted at non-programmers, but it's written in a very algorithmic way and so it can be used for implementing such an engine, especially if you have any experience with compilation. I've done so myself.

The article also mentions more advanced tips and tricks, and has some information about engine limitations.

Oak