tags:

views:

165

answers:

3

I've been struggling with doing some relatively straightforward regular expression matching in Java 1.4.2. I'm much more comfortable with the Perl way of doing things. Here's what's going on:

I am attempting to match /^<foo>/ from "<foo><bar>"

I try:

Pattern myPattern= Pattern.compile("^<foo>");
Matcher myMatcher= myPattern.matcher("<foo><bar>");
System.out.println(myMatcher.matches());

And I get "false"

I am used to saying:

print "<foo><bar>" =~ /^<foo>/;

which does indeed return true.

After much searching and experimentation, I discovered this which said:

"The String method further optimizes its search criteria by placing an invisible ^ before the pattern and a $ after it."

When I tried:

Pattern myPattern= Pattern.compile("^<foo>.*");
Matcher myMatcher= myPattern.matcher("<foo><bar>");
System.out.println(myMatcher.matches());

then it returns the expected true. I do not want that pattern though. The terminating .* should not be necessary.

Then I discovered the Matcher.useAnchoringBounds(boolean) method. I thought that expressly telling it to not use the anchoring bounds would work. It did not. I tried issuing a

myMatcher.reset();

in case I needed to flush it after turning the attribute off. No luck. Subsequently calling .matches() still returns false.

What have I overlooked?

Edit: Well, that was easy, thanks.

+11  A: 

Use the Matcher find method (instead of the matches method)

jdigital
+3  A: 

Matcher.useAnchoringBounds() was added in JDK1.5 so if you are using 1.4, I'm not sure that it would help you even if it did work (notice the @since 1.5 in the Javadocs).

The Javadocs for Matcher also state that the match() method:

Attempts to match the entire region against the pattern.

(emphasis mine)

Which explains why you only got .matches() == true when you changed the pattern to end with .*.

To match against the region starting at the beginning, but not necessarily requiring that the entire region be matched, use either the find() or lookingAt() methods.

matt b
btw, I don't think the part about "adding an invisible ^ and $" is correct - it's just that the matches() method behaves in a way as if you did - it only returns true if you match the _entire_ string.
matt b
+3  A: 
erickson