tags:

views:

1002

answers:

4

What is the importance of Pattern.compile() method?
Why do I need to compile the regex string before getting the Matcher object?

for example :

String regex = "((\\S+)\\s*some\\s*";

Pattern pattern = Pattern.compile(regex); // why i need to compile
Matcher matcher = pattern.matcher(text); //
+5  A: 

When you compile the Pattern Java does some computation to make finding matches in Strings faster. (Builds an in-memory representation of the regex)

If you are going to reuse the Pattern multiple times you would see a vast performance increase over creating a new Pattern every time.

In the case of only using the Pattern once, the compiling step just seems like an extra line of code, but, in fact, it can be very helpful in the general case.

jjnguy
Of course you can write it all in one line `Matcher matched = Pattern.compile(regex).matcher(text);`. There are advantages to this over introducing a single method: the arguments are effectively named and it is obvious how to factor out the `Pattern` for better performance (or to split across methods).
Tom Hawtin - tackline
It always seems like you know so much about Java. They should hire you to work for them...
jjnguy
@jinguy Check Tom's profile
Amarghosh
Ha, I know. It was a joke.
jjnguy
+10  A: 

Compile parses the regular expression and builds an in-memory representation. The overhead to compile is significant compared to a match. If you're using a pattern repeatedly it will gain some performance to cache the compiled pattern.

Thomas Jung
Plus you can specify flags like case_insensitive, dot_all, etc. during compilation, by passing in an extra flags parameter
Sam Barnum
+4  A: 

The compile() method is always called at some point; it's the only way to create a Pattern object. So the question is really, why should you call it explicitly? One reason is that you need a reference to the Matcher object so you can use its methods, like group(int) to retrieve the contents of capturing groups. The only way to get hold of the Matcher object is through the Pattern object's matcher() method, and the only to get hold of the Pattern object is through the compile() method. Then there's the find() method which, unlike matches(), is not duplicated in the String or Pattern classes.

The other reason is to avoid creating the same Pattern object over and over. Every time you use one of the regex-powered methods in String (or the static matches() method in Pattern), it creates a new Pattern and a new Matcher. So this code snippet:

for (String s : myStringList) {
    if ( s.matches("\\d+") ) {
        doSomething();
    }
}

...is exactly equivalent to this:

for (String s : myStringList) {
    if ( Pattern.compile("\\d+").matcher(s).matches() ) {
        doSomething();
    }
}

Obviously, that's doing a lot of unnecessary work. In fact, it can easily take longer to compile the regex and instantiate the Pattern object, than it does to perform an actual match. So it usually makes sense to pull that step out of the loop. You can create the Matcher ahead of time as well, though they're not nearly so expensive:

Pattern p = Pattern.compile("\\d+");
Matcher m = p.matcher("");
for (String s : myStringList) {
    if ( m.reset(s).matches() ) {
        doSomething();
    }
}

If you're familiar with .NET regexes, you may be wondering if Java's compile() method is related to .NET's RegexOptions.Compiled modifier; the answer is no. In .NET you don't usually have to worry about redundant object creation because the system automatically caches a certain number of Regex objects, whether you use a constructor or one of the static convenience methods like Regex.Matches(s, @"\d+"). If you specify the Compiled option:

Regex r = new Regex(@"\d+", RegexOptions.Compiled);

...it compiles the regex directly to CIL byte code, allowing it to perform much faster, but at a significant cost in up-front processing and memory use--think of it as steroids for regexes. Java has no equivalent for .NET's Compiled option. There's no difference between a Pattern that's created behind the scenes by String#matches(String) and one you create explicitly with Pattern#compile(String).

Alan Moore
+2  A: 

Pre-compiling the regex increases the speed. Re-using the Matcher gives you another slight speedup. If the method gets called frequently say gets called within a loop, the overall performace will certainly go up.

Optimize Prime