tags:

views:

199

answers:

4

A buddy of mine is currently tinkering around with JpCap in Java and we found some interesting (maybe?) problem regarding regular expressions in Java.

Only HTTP traffic is being captured and subsequently analyzed. For this purpose he is using a pattern like this one:

Pattern p = Pattern.compile("(www)");

But what both of us have not been able to find out: why does the above pattern produce no matches at all, while the following does:

Pattern p = Pattern.compile(".*(www).*");

For what I can see, both of these should be identical, shouldn't they? I don't have that much experience regarding regular expressions, so there might be a very simple answer to this question :)

+7  A: 

Your first pattern only accepts the string 'www'.

The second pattern accepts any string with 'www' in it, anywhere.

This website has more information regarding Java regex.

jjnguy
A: 

Regular expressions need delimiters.

The " sign is simply used to declare a string and therefor is not valid as a delimiter.

Your second example has delimiters, so it works.

Vordreller
No. You're thinking of scripting languages like Perl and JavaScript, where regexes are supported at the language level. In Java, like C# and Python, regexes are just strings.
Alan Moore
A: 

I'm assuming you're using the matches() method and so regexes in this case are applied to lines. Which means that any regex has the beginning of line and end of line characters added "under the hood."

So instead of getting "(www)" you're actually getting "^(www)$" and as you've seen this significantly changes the meaning of the regex.

For more information on this nuance of java: http://www.regular-expressions.info/java.html

Gavin Miller
+4  A: 

Oh, nevermind, I just found out we both were a little bit API-blind ;)

We were using

Matcher.matches()

which matches the pattern against the entire string, instead of

Matcher.find()

which tries to find the pattern anywhere in the given string.

Thanks for the answers, though! :)

AdrianoKF