views:

392

answers:

1

I have just read this interesting article about the implementation details for various languages that support regular expressions.

It describes an alternative implementation of regular expressions that uses non-deterministic finite automatons (NFAs) versus deterministic ones (DFAs). It claims that back-tracking DFA implementations (the version used in Perl, Java, and others) are susceptible to very slow performance on some particularly "pathological" regular expressions. (grep, awk, and Tcl still use DFAs, but somehow are exponentially faster)

It makes no reference to the .NET framework, but I would like to know how .NET (C# in particular) regular expressions are implemented, and how they compare in terms of performance.

Edit:

Can I assume since the answerer's article mentions .NET does backtracking, that it will be on par with Perl and Java?

+10  A: 

There's an awesome write-up here. He takes advantage of the fact that you can step in to the .NET framework code and see what it does, and explains how everything works. It's an excellent read.

ojrac
...written by SO denizen Jeff Moser.
Michael Petrotta
Thanks for the connection. I wonder if I originally found the link through SO...
ojrac