views:

37

answers:

1

I have been wondering about the performance of regular expression implementations lately, and have had a hard time coming up with much useful information.

Its easy enough to benchmark browser/javascript regex performance (plenty of tools on the net). The javascript regex implementation in Chrome and Opera pretty much destroy every other major browser.

But when it comes down to the fastest c++, java, c#, python, etc.. regex implementation, there aren't too many good benchmarks or comparisons.

So, whats the fastest regex library out there with close to complete feature implementation? (not too concerned about back-references)

+3  A: 

Although I haven't done more than a couple of tests myself, I believe that the re2 library was meant to be fast so I'm guessing it is ;)

However, to make this a little more constructive. Take a look at this benchmark: http://lh3lh3.users.sourceforge.net/reb.shtml

WoLpH
re2 was one of the first that came on my radar. I believe it is used in v8?
jdc0589
Not as far as I know. Google Chrome (so probably V8 entirely) uses Irregexp: http://blog.chromium.org/2009/02/irregexp-google-chromes-new-regexp.html
WoLpH
I was literally in the process of editing my comment. Nice timing
jdc0589
I'm a quite astonished by the crappy performance of the boost libraries.
anno
I've tested the code with VC2010, the TR1 is 2 times faster than Boost.Regex for the first pattern, no difference for the rest. And I'm nowhere near the abysmal performance showed in the bench.
anno
@anno: The results very by type of regex and type of input. For example, the `re2` engine works very good with a large input but is pretty slow with just a small amount of text. And regexes like `(foo|bar)` are slow in quite a few regular expression engines.
WoLpH