tags:

views:

1055

answers:

4

I need to do a lot of searches of certain patterns in source files while the user is changing them, so I need to do regexp matching that is efficient in time and memory. The pattern repeats itself so should be compiled once, but I need to be able to retrieve subparts (rather than just confirm a match)

I'm considering using java.util.regexp or the Jakarta perl5util (if it still exists, been a few years since I used it), or perhaps the Eclipse search engine though I doubt that ti's smarter.

Is there any significant performance difference between the two?

+6  A: 

I am not sure there is a huge performance gap in term of the different regexp java engines.

But there sure is a performance issue when constructing a regexp (and that is, if the data is large enough, as noted by Jeff Atwood)

The only thing you should avoid is catastrophic backtracking, better avoided when using atomic grouping.

So, by default I would use the java.utils.regexp engine, unless you have specific perl-compliant sources of regexp you need to reuse in your program.

Then I would carefully construct the regexp I intend to use.

But in term of choosing one engine or another... as it has been said in many other questions...:

  • "make it work, make it fast - in that order"
  • beware of "premature optimization".
VonC
+2  A: 

As VonC says, you need to know your regexps. It doesn't hurt to compile the Regexes beforehand OTHERWISE, the cost of compiling regex each time can hurt the performance badly.

For some categories, there are alternate libraries : http://jint.sourceforge.net/jint.html which might have better performance. Then again, it depends upon which version of java you're using.

JDK 1.6 shows the maturity of the regex engine with good features and performance combined.

anjanb
+2  A: 

Overall, the java.util.regex (not "regexp") package is at least as good any other Java regex library, including Jakarta ORO (your "Perl5Util" lib). In addition, it supports both atomic groups and possessive quantifiers, both of which I find invaluable for writing blazingly fast regexes. It also supports pre-compiled regexes and capturing groups, but I think that's true of all the libraries.

Alan Moore
+1  A: 

There's a comparison of benchmarking different libs I found useful http://www.tusker.org/regex/regex_benchmark.html

Alexey Gopachenko